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NO-GAPS DELOCALIZATION FOR GENERAL RANDOM MATRICES 


MARK RUDELSON AND ROMAN VERSHYNIN 


Abstract. We prove that with high probability, every eigenvector of a random matrix is delocalized 
in the sense that any subset of its coordinates carries a non-negligible portion of its £2 norm. Our 
results pertain to a wide class of random matrices, including matrices with independent entries, 
symmetric and skew-symmetric matrices, as well as some other naturally arising ensembles. The 
matrices can be real and complex; in the latter case we assume that the real and imaginary parts 
of the entries are independent. 
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1. Introduction 

While eigenvalues of random matrices have been extensively studied since 1950-s (see [1, 3, 
30] for introduction), less is known about eigenvectors of random matrices. For matrices whose 
distributions are invariant under unitary or orthogonal transformations, the picture is trivial: their 
normalized eigenvectors are uniformly distributed over the unit Euclidean sphere. Examples of such 
random matrices include the classical Gaussian Unitary Ensemble (GUE), Gaussian Orthogonal 
Ensemble (GOE) and Ginibre ensembles. All entries of these matrices are normal, and either all 
of them are independent (in Ginibre ensemble) or independence holds modulo symmetry (in GUE 
and GOE). 

Guided by the ubiquitous universality phenomenon in random matrix theory (see [33, 35, 18, 10]), 
we can anticipate that the eigenvectors behave in a similar way for a much broader class of random 
matrices. Thus, for a general nxn random matrix A with independent entries we may expect that 
the normalized eigenvectors are approximately uniformly distributed on the unit sphere. The same 
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should hold for a general Wigner matrix A, a symmetric random matrix with independent entries 
on and above the diagonal. 

The uniform distribution on the unit sphere has several remarkable properties. Showing that 
the eigenvectors of general random matrices have these properties, too, became a focus of attention 
in the recent years [16, 17, 11, 12 , 6 , 13, 14, 15, 36, 39, 5, 7, 28]. One of such properties is 
delocalization in the sup-norm. For a random vector v uniformly distributed on the unit sphere, a 
quick check reveals that no coefficients can be too large; in particular ||f ]|oo = 0(y/logn/s/n) holds 
with high probability. Establishing a similar delocalization property for eigenvectors of random 
matrices is a challenging task. For eigenvectors of Hermitian random matrices, a weaker bound 
IMloo = 0(log 7 n/y/n), with 7 = 0(1), was shown by Erdos et. al. [16, 17] using spectral methods. 
Later, Vu and Wang [39] obtained the optimal exponent 7 = 1/2 for most eigenvectors (those 
corresponding to the bulk of the spectrum). Recently, the authors of the current paper established 
delocalization for random matrices with all independent entries by developing a completely different, 
geometric approach [28]. 

1.1. No-gaps delocalization. In the present paper, we will address a different natural delocal¬ 
ization property. Examining a random vector uniformly distributed on the sphere, we may notice 
that its mass (the i 2 norm) is more or less evenly spread over the coordinates. There are no “gaps” 
in the sense that all all subsets J C [n] carry a non-negligible portion of the mass. 

The goal of this paper is to establish this property for eigenvectors of random matrices. Formally, 
we would like to show that with high probability, for any eigenvector v, any e € ( 0 , 1 ), and any 
subset of coordinates J C [n] of size at least en, one has 

N 2 ) 7 > H e )\\ v h, 

i&J 

where 0 ■ (0,1) —>• (0,1) is some nice function. We call this phenomenon no-gaps delocalization. 

One may wonder about the relation of the no-gaps delocalization to the delocalization in the 
sup-norm we mentioned before. As is easy to see, neither of these two properties implies the other. 
They offer complementary insights into the behavior of the coefficients of the eigenvectors - one 
property rules out peaks and the other rules out gaps. 

The need for no-gaps delocalization arises naturally in problems of spectral graph theory. A 
similar notion appeared in the pioneering work of Dekel et. al. [ 8 ]. The desirability of establishing 
no-gaps delocalization was emphasized in a paper of Arora and Bhaskara [2], where a similar but 
weaker property was proved for a fixed subset J. Very recently, Eldan et. al. [9] established a 
weaker form of no-gaps delocalization for the Laplacian of an Erdos-Renyi graph with e > 1/2 
and the function 0 depending on e and n. This delocalization has been used to prove a version 
of a conjecture of Chung on the influence of adding or deleting edges on the spectral gap of an 
Erdos-Renyi graph. For shifted Wigner matrices and one-element sets J, the no-gaps delocalization 
was proved by Nguyen et. al. [20] with 0(1 /n) = (1 /n) c for some absolute constant C. 

In the present paper, we prove the no-gaps delocalization for a wide set of ensembles of ran¬ 
dom matrices including matrices with independent entries, symmetric and skew-symmetric random 
matrices, and others. Explicitly, we make the following assumption about possible dependencies 
among the entires. 

Assumption 1.1 (Dependences of entries). Let A be an N x n random matrix. Assume that for 
any i, j € [n], the entry Aij is independent of the rest of the entries except possibly Aji. We also 
assume that the real part of A is random and the imaginary part is fixed. 

Note that Assumtion 1.1 implies the following important independence property, which we will 
repeatedly use later: for any J C [N ], the entries of the submatrix Aj x jc are independent. 
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Fixing the imaginary part in Assumtion 1.1 allows us to handle real random matrices. This 
assumption can also be arranged for complex matrices with independent real and imaginary parts, 
once we condition on the imaginary part. One can even consider a more general situation where 
the real parts of the entries conditioned on the imaginary parts have variances bounded below. 

We will also assume ||A|| = 0{y/n) with high probability. This natural condition holds, in 
particular, if the entries of A have mean zero and bounded fourth moments [19]. To make this 
rigorous, we fix a number M > 1 and introduce the boundedness event 

B a ,m-={\\A\\ < Myfti} . (1.1) 

1.2. Main results. Let us start with the simpler case where matrix entries have continuous dis¬ 
tributions. This will allow us to present the method in the most transparent way, without having 
to navigate numerous obstacles that arise for discrete distributions. 

Assumption 1.2 (Continuous distributions). We assume that the real parts of the matrix entries 
have densities bounded by some number K > 1. 

Under Assumptions 1.1 and 1 . 2 , we show that every subset of at least eight coordinates carries 
a non-negligible part of the mass of any eigenvector. This is summarized in the following theorem. 

Theorem 1.3 (Delocalization: continuous distributions). Let A be an nx n random matrix which 
satisfies Assumptions 1.1 and 1.2. Choose M > 1 such that the boundedness event Ba,m holds with 
probability at least 1/2. Let e € (8/n, 1/2) and s > 0. Then, conditionally on Ba,m, the following 
holds with probability at least 1 — ( cs) £n . Every eigenvector v of A satisfies 

11 vj11 2 > (es) 6 ||u ||2 for all I C [re], ]/] > ere. 

Here c = c(K, M ) > 0. 

The restriction e < 1/2 can be easily removed, see Remark 1.6 below. 

Note that we do not require any moments for the matrix entries, so heavy-tailed distributions are 
allowed. However, the boundedness assumption formalized by (1.1) implicitly yields some upper 
bound on the tails. Indeed, if the entries of A are i.i.d. and mean zero, then ||A|| = 0(y/n) can 
only hold if the fourth moments of entries are bounded [4], 

Further, we do not require that the entries of A have mean zero. Therefore, adding to A any 
fixed matrix of norm 0(y/n ) does not affect our results. 

Extending Theorem 1.3 to general, possibly discrete distributions, is a challenging task. We are 
able to do this for matrices with identically distributed entries and under the mild assumption that 
the distributions of entries are not too concentrated near a single number. 

Assumption 1.4 (General distribution of entries). We assume that the real parts of the matrix 
entries are i.i.d. copies of a random variable £, which satisfies 

supP (|£ — u\ < 1} < 1 — p, P {|£| > K} < p/2 for some K,p > 0. (1.2) 

■uEM 

Among many examples of discrete random variables £ satisfying Assumption 1.4, the most promi¬ 
nent one is the symmetric Bernoulli random variable £, which takes values —1 and 1 with probability 
1/2 each. 

With Assumption 1.2 replaced by 1.4, we can prove the no-gaps delocalization result, which we 
summarize in the following theorem. 

Theorem 1.5 (Delocalization: general distributions). Let A be an re x re random matrix which 
satisfies Assumptions 1.1 and l.f. Choose M > 1 such that the boundedness event Ba,m holds with 
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probability at least 1/2. Let e > 1/n and s > c±£ 7 / 6 n 1 / e + e C2 /v^. Then, conditionally on Ba,m> 
the following holds with probability at least 1 — (c 3 s) en . Every eigenvector v of A satisfies 

11 vj 112 > (es) 6 ||f||2 for all I C [re], |/| > en. 

Here Ck = Ck{p, K, M) > 0 for k = 1,2, 3. 

Remark 1.6. The restriction s < I/C 3 making the theorem meaningful implies that e € (c 4 n -1 / 7 , C 5 ) 
for some C 4 > 0 and C 5 < 1. The upper bound, however, can be easily removed. If e > C 5 , then 
delocalization event 

11 v 1 11 2 > C611u11 2 for all I C [n], |/| > en 

holds with probability at least 1 — e~ crn . This follows by applying Theorem 1.5 with a sufficiently 
small constant e = C 7 which would allow to choose s = e _ 1 C 3 . 

The restrictions on e and s can be significantly relaxed; see the end of Section 6 . We did not 
attempt to optimize these bounds, striving for clarity of the argument in lieu of more precise 
estimates. 


2. Outline of the argument 

Our approach to Theorems 1.3 and 1.5 is based on reducing delocalization to invertibility of 
random matrices. We will now informally explain this reduction, which is quite flexible and can be 
applied for many classes of random matrices. 

2.1. Reduction of delocalization to invertibility. Let us argue by contradiction. Suppose 
there exists a localized unit eigenvector v of A, which means that 

ll^/lb = o(l) for some index subset I C [n], |/| = en. ( 2 - 1 ) 

Let us decompose the matrix 1 B := A — A into two sub-matrices, Bj that consists of columns 
indexed by / and Bjc with columns indexed by I c . Then 

0 = Bv = Bjvi + Bjcvjc. (2.2) 

To estimate the norm of -B/uj, note that the operator norm of B can be bounded as 

H-B/ll < ||T?|| < 2||v4|| = 0(y/n) with high probability, 

where we used the boundedness event (1.1). Combining with (2.1), we obtain 

IIB7U7H2 = o( v / n). 

But the identity ( 2 . 2 ) implies that the norms of Bjvi and Bi^vio are the same, thus 

||B/cU/c || 2 = o( v / n). (2.3) 

Since v is a unit vector and vj has a small norm, the norm of u/c is close to 1. Then (2.3) implies 
that the matrix Bjc is not well invertible on its range. Formally, this can be expressed as a bound 
on the smallest singular value: 

•Smin(B/c) = o{y/n). (2.4) 

Recall that Bjc is an n x (n — en) random matrix. Thus we reduced delocalization to quantitative 
invertibility of almost square random matrices. 


7 For convenience of notation, we skip the identity symbol thus writing A — A for A — XI. 
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2.2. Invertibility of random matrices. A standard expectation in the non-asymptotic random 
matrix theory is that random matrices are well invertible, and the bad event (2.4) should not hold. 
For example, if the n x (n — en ) random matrix H = Bjc had all independent standard normal 
entries, then we would have the desired lower bound 

Smin{H) = with high probability, (2-5) 

see e.g. [37]. Invertibility results similar to (2.5) are now available for distributions more general 
than Gaussian (see [25]), and in particular for discrete distributions. Handling discrete distributions 
in the invertibility problems is considerably more challenging than discrete ones. Recent successes in 
these problems were based on understanding the interaction of probability with arithmetic structure , 
which was quantified via generalized arithmetic progressions in [32, 31] and approximate least 
common denominators (LCD) in [23, 25, 38]; see [33, 26] for background and references. 

Nevertheless, there are significant difficulties in our situation that prevent us from deducing (2.5) 
for H = Bjc from any previous work. Let us mention some of these difficulties. 

2.2.1. Lack of independence. Not all entries of A (and thus of H ) may be independent. As we recall 
from Assumption 1.1, we are looking for ways to control symmetric and non-symmetric matrices 
simultaneously. This makes it necessary to extract rectangular blocks of independent entries from 
matrix H and modify the definition of the LCD adapting it to this block extraction. 

2.2.2. Small exceptional probability required. We need that the delocalization result, and thus the 
invertibility bound (2.5), hold uniformly over all index subsets I of size en. Since there are ( si) ~ 
e~ £n such sets, we would need the probability of non-invertibility (2.4) to be at most e £n . While 
this is possible to achieve for real matrices with all independent entries [25], such small exceptional 
probabilities (smaller than e~ £n ) may not come automatically for the general case. 

2.2.3. Complex entries. Results of the type (2.5) which hold with the probability we need are avail¬ 
able only for real matrices; see in particular [26, 37, 22], Since eigenvalues A even of real matrices 
may be complex, we must work with complex random matrices. Extending the known results to 
complex matrices is non-trivial. Indeed, in order to preserve the matrix-vector multiplication, we 
replace a complex n x N random matrix B = R + iT by the real 2 n x 2N random matrix [ y ~^T]. 
The real and imaginary parts R and T each appear twice in this matrix, which causes extra depen¬ 
dences of the entries. Moreover, we encounter a major problem while trying to apply the covering 
argument to show that the least common denominator of the subspace orthogonal to a certain set 
of columns of H is large. Indeed, since we have to consider a real 2 n x 2N matrix, we will have to 
construct a net in a subset of the real sphere of dimension 2 N. The size of such net is exponential in 
the dimension. On the other hand, the number of independent rows of R is only n, so the small ball 
probability will be exponential in terms of n. As n < N, the union bound would not be applicable. 

To overcome this difficulty, we introduce a stratification of the complex sphere, partitioning it 
according to the correlation between the real and the imaginary parts of vectors. This stratification, 
combined with a modified definition of the least common denominator, allows us to obtain stronger 
small ball probability estimates for weakly correlated vectors in Section 10. Yet the set of weakly 
correlated vectors has a larger complexity, which is expressed in the size of the nets. The cardinality 
of the nets has to be accurately estimated in Section 11. These two effects, the improvement of 
the small ball probability estimate and the increase of the complexity, work against each other. In 
Section 12, we show that they exactly balance each other, making it possible to apply the union 
bound. 

2.3. Organization of the argument. After discussing basic background material in Section 3, 
we present a formal reduction of delocalization to invertibility in Section 4. The rest of the paper 
will focus on invertibility of random matrices. Section 5 covers continuous distributions; the main 
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result there is Invertibility Theorem 5.1, from which we quickly deduce Delocalization Theorem 1.3. 
These sections are relatively simple and can be read independently of the rest of the paper. 

Invertibility of random matrices with general distributions is considerably more difficult. We 
address this problem in Sections 6 - 13. The main result there is Invertibility Theorem 6.1, from 
which we quickly deduce Delocalization Theorem 1.5. 

Our general approach to invertibility follows the method developed by the authors in [23, 25], 
see also [26]. We reduce proving invertibility to the distance problem , where we seek a lower bound 
on dist(Z, E) where Z is a random vector with independent coordinates and E is an independent 
random subspace in M jV . If we choose E to be a hyperplane (subspace of codimension one), we 
obtain an important class of examples in the distance problem, namely sums of independent random 
variables. 

In Section 7 we study small ball probabilities for sums of real-valued independent random vari¬ 
ables, as well as their higher dimensional versions dist (Z,E). These probabilities are controlled by 
the arithmetic structure of E 1 -, which we quantify via so-called least common denominator (LCD) 
of E^. The larger LCD, the more E 1 - is arithmetically unstructured, and the better are small ball 
probabilities for dist (Z,E). We formalize this relation in the very general Theorem 7.5, and then 
we specialize in Sections 7.3 and 7.4 to sums of independent random variables and distances to 
subspaces. 

In Section 8, we state our main bound on the distance between random vectors and subspaces; 
this is Theorem 8.1. In order to deduce this result from the small ball probability bounds of 
Section 7, two things need to be done: (a) transfer the problem from complex to real, and (b) 
show that random subspaces are arithmetically unstructured, i.e. the LCD of E 1 - is large. The 
transfer to a real problem is done in Section 8.1, and then our main focus becomes the structure 
of subspaces. 

By the nature of our problem, the subspaces E 1 - will be the kernels of random matrices. The 
analysis of such kernels starts in Section 9. We show there that all vectors in E 1 - are incompressible , 
which means that they are not localized on a small fraction of coordinates. 

Unfortunately, in the process of transferring the problem from complex to real in Section 8.1 
introduces extra dependences among the entries of the random matrix. In Section 10 we adjust our 
results on small ball probabilities so they are not destroyed by those dependences. We find that 
these probabilities are controlled not only on LCD but also by real-imaginary correlations of the 
vectors in E 1 -. 

Recall that our goal is to show that all vectors in E 1 - = ker (B) are unstructured, i.e. they have 
large LCD. We would obtain this if we can lower-bound Bz for all vectors with small LCD. For a 
fixed 2, a lower bound follows from the small ball probability results of Section 10. To make the 
bound uniform, it is enough to run a union bound over a good net of the set of vectors with small 
LCD. We construct a good net for level sets of LCD and real-imaginary correlations in Section 11. 
Informally, small LCD or small correlation impose strong constraints, which make it possible to 
construct a smaller net than based on the trivial (volume-based) argument. 

After this major step, the argument can be wrapped up relatively easily. In Section 12 we finalize 
the distance problem. We combine the small ball probability results with the fact that the random 
subspace are unstructured, and deduce Theorem 8.1. 

In Section 13 we finalize the invertibility problem for general distributions; here we deduce 
Theorem 6.1. This is done by modifying the argument for continuous distributions in Section 5 
using the non-trivial distance bound Theorem 8.1 for general distributions. 
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3. Notation and preliminaries 


Throughout the paper, by C, c, C\,... we denote constants that may depend only on the param¬ 
eters K and p that control the distributions of matrix entries in Assumptions 1.2 and 1.4 and the 
parameter M that controls the matrix norm in (1.1). 

We denote by S]^ -1 and the unit spheres of M n and C n respectively. We denote by B(a,r ) 
the Euclidean ball in M n centered at a point a and with radius r. The unit sphere of a subspace E 
will be denoted Se, and the orthogonal projection onto a subspace E by Pe- 

Given an n x m matrix A and an index sets J C [n], by Aj we denote the n x | J\ sub-matrix of 
A obtained by including the columns indexed by J. Similarly, for a vector z € C n , by zj we denote 
the vector in C J which consists of the coefficients indexed by J. 


3.1. Concentration function. The concept of concentration function has been introduced by 
P. Levy and studied in probability theory for several decades, see [26] for the classical and recent 
history. 

Definition 3.1 (Concentration function). Let Z be a random vector taking values in C n . The 
concentration function of X is defined as 

C(Z, t) = P {\\Z — u\\2 < t} , t > 0. 

The concentration function gives a uniform upper bound on the small ball probabilities for X. 
We defer a detailed study of concentration function for sums of independent random variables to 
Section 7. Let us mention here only one elementary restriction property. 

Lemma 3.2 (Small ball probabilities: restriction). Let £i ,£n be independent random variables 
and ai,..., otv be real numbers. Then, for every subset of indices J C [ N ] and every t > 0 we have 

N 

c ^ ^ 2 a j^jit'j ~ £(^2■ 

j&J 3 =1 

Proof. This bound follows easily by conditioning on the random variables fj with j 0 J and 
absorbing their contribution into a fixed vector u in the definition of the concentration function. □ 


We will also use a simple and useful tensorization property which goes back to [21, 23]. 


Lemma 3.3 (Tensorization). Let Z = (Z\,..., Z n ) be a random vector in C n with independent 
coordinates. Assume that there exists numbers to,M > 0 such that 


Then 


C(Zj,t ) < M(t + to) for all j and t > 0. 
C{Z, ty/n) < [CM(t + f 0 )] n for all t > 0. 


Proof. By translation, we can assume without loss of generality that u = 0 in the definition of 
concentration function. Thus we want to bound the probability 


v{\\z\\ 2 <t^} =r {^2\Zj \ 2 <r 

3 =1 


n 


Rearranging the terms, using Markov’s inequality and then independence, we can bound this prob¬ 
ability by 


j™-^X>/ >0 j <Eexp ^n~^^2\ z j \ 2 j 


= e n Eexp(— \Zj\ 2 /t 2 
3 = 1 


(3.1) 
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To bound each expectation, we use the distribution integral formula followed by a change of vari¬ 
ables. Thus 


/*! /‘00 

IEexp(— \Zj\ 2 /t 2 ) = / P {exp(— \Zj\ 2 /t 2 ) > x} dx = / 2ye~ y P {|Zj| < ty} dy. 

Jo Jo 

By assumption, we have P {\Z 3 \ < ty} < M(ty + to)- Substituting this into the integral and 
evaluating it, we obtain 

Eexp(— \Zj \ 2 /t 2 ) < CM(t + to)- 

Finally, substituting this into (3.1), we see that the probability in question is bounded by e n [CM(t+ 
to)] n - This completes the proof of the lemma. □ 


4. Reduction of delocalization to invertibility of random matrices 


In this section, we show how to deduce delocalization from quantitative invertibility of random 
matrices. We outlined this reduction in Section 2.1 and will now make it formal. For simplicity of 
notation, we shall assume that en /2 € N, and we introduce the localization event 

Loc(A,e, 5 ) := {3 eigenvector v € S'/ -1 , 31 C [n], |/| = en : ||r;/|| 2 < 5} • 

Since we assume in Theorem 1.3 that the boundedness event Ba,m holds with probability at least 
1 / 2 , the conclusion of that theorem can stated as follows: 

P {Loc(A, e, (es) 6 ) and Ba,m} — ( cs) £n . (4-1) 

The following proposition reduces proving delocalization results like (4.1) to an invertibility bound. 


Proposition 4.1 (Reduction of delocalization to invertibility). Let A be an n x n random matrix 
with arbitrary distribution. Let M > 1 and e,po, 5 € (0,1/2). Assume that for any number Ao € C, 

|Ao| < My/n, and for any set I C [n], |/| = en, we have 

p {smin(04 - A 0 )/c) < 8 bMyfh and B a ,m } < Po- (4.2) 


Then 


P {Loc(3l, e, 5) and Ba,m} B: 55 2 (e/e) £n po- 


Proof. Assume both the localization event and the boundedness event Ba,m hold. Using the def¬ 
inition of Loc(A,e,<5), choose a localized eigenvalue-eigenvector pair (X,v) and an index subset /. 
Decomposing the eigenvector as 

v = vi + vie 

and multiplying it by A — A, we obtain 

0 = (A — A)u — (A — A)/T/ T (A — A)/cr/c. 

By triangle inequality, this yields 

II (A - A)/cu / c || 2 = || (A - A)/u /|| 2 < (|| A|| + | A |)11-d/11 2 • 

By the localization event Loc(A, e,6), we have ||u /|| 2 < 5. By the boundedness event Ba,m and 
since A is an eigenvalue of A, we have |A| < ||A|| < My/n. Therefore 

||(A — X)icvic\\ 2 < 2M5\fn. (4.3) 

This happens for some A in the disc {z € C : \z\ < My/n}. We will now run a covering argument 
in order to fix A. Let J\f be a (2M<5y / n)-net of that disc. One can construct A f so that 

w 5 1- 

Choose Ao € W so that |Ao — A| < 2M5y/n. By (4.3), we have 

||(A — Ao)/cuH | 2 < 4 M5y/n. 


(4.4) 


Since ||u /||2 < 5 < 1/2, we have ||u/c|| 2 > IHI 2 — ll^/lb > 1/2. Therefore, (4.4) implies that 

Smin((^4 - A o)/c) < 8 Mbyfn. (4.5) 

Summarizing, we have shown that the events Loc(A, e. 5) and Bam imply the existence of a 
subset / C [n], |/| = en, and a number Ao € Af, such that (4.5) holds. Furthermore, for fixed I and 
Ao, assumption (4.2) states that (4.5) together with Bam hold with probability at most po- So by 
the union bound we conclude that 

P {Loc(A, e, 5) and B a ,m } < ■ \Af\ • po < ( 7 ) ■ ^ ■ Po- 

This completes the proof of the proposition. □ 

5. Invertibility for continuous distributions 

The reduction we made in the previous section puts invertibility of random matrices into the 
spotlight. Our goal becomes to establish invertibility property (4.2). In this section we do this for 
matrices for continuous distributions. 

Theorem 5.1 (Invertibility: continuous distributions). Let A be annxn random matrix satisfying 
the assumptions of Theorem 1.3. Let M > 1, e € (0,1), and let I C [ n} be any fixed subset with 
|/| = en. Then for any t > 0, we have 

p {smin(Al/c) < ty/n and Bam} < {CKMt QA e~ 1A f n > 2 . 

Before we pass to the proof of this result, let us first see how it implies delocalization. 

5.1. Deduction of Delocalization Theorem 1.3. Let A be a matrix as in Theorem 1.3. We are 
going to use Proposition 4.1, so let us choose Ao and I as in that proposition and try to check the 
invertibility condition (4.2). Observe that the shifted matrix A — Ao still satisfies the assumptions 
of Theorem 5.1, and Bam implies Ba~\ 0 , 2 M because |Ao| < My/n. So we can apply Theorem 5.1 
for A — Ao and with 2 M, which yields 

p {•Smin(^4 - A 0 Id)/c < ty/n and Bam } ^ Po, 

for t > 0, where po = (CKMt 0A e~ 1A ) £n / 2 . Therefore invertibility condition (4.2) holds for 5 = 
t/8M and po- Applying Proposition 4.1, we conclude that 

/8M\2 

P{Loc(A,e, t/8M) and Bam} — {&/^) £n Po 

for t > 0. Setting t = 8M(es) 6 and substituting the value of po, we obtain 

P{Loc(A,e, (es) 6 ) and Bam} < ( C{K,M)s) £n 

for s > 0. This completes the proof of Theorem 1.3. □ 

The proof of Theorem 5.1 will occupy the rest of this section. 

5.2. Decomposition of the matrix. To make the proof of Theorem 5.1 more convenient, let us 
change notation slightly, namely replace e with 2e. Thus A is an (1 + 2 e)n x (1 + 2 e)n matrix and 
|I| = 2 en. The desired conclusion then would change to 

P {s min (Afc) < ty/n and Bam} — (CKMt 0A e~ 1A ) £n . (5.1) 

Without loss of generality, we can assume that I is the interval of the last 2 en indices. 

Let us decompose A/c as follows: 




where B and G are rectangular matrices of size (1 +e)n x n and enxn respectively. By Assumption 

1.1. the random matrices B and G are independent, and moreover all entries of G are independent. 
We are going to show that either ||Rx ||2 or ||Gx ||2 is nicely bounded below for every vector 

x € S^T 1 . To control B, we use the second negative moment identity to bound the Hilbert- 
Schmidt norm of the pseudo-inverse of B. We deduce from it that most singular values of B are 
not too small - namely, all but O.len singular values are bounded below by > \fen. It follows that 
B is nicely bounded below when restricted onto a subspace of codimension O.len. (This subspace 
is formed by the corresponding singular vectors.) Next, we condition on B and we use G to control 
the remaining O.len dimensions. A simple covering argument shows that G is nicely bounded below 
when restricted to a subspace of dimension O.len. Therefore, either B or G is nicely bounded below 
on the entire space, and thus A is nicely bounded below on the entire space as well. 

We will now pass to a detailed proof of Theorem 5.1. 

5.3. Distances between random vectors and subspaces. In this section we start working 
toward bounding B below on a large subspace. We quickly reduce this problem to a control of the 
distance between a random vector (a column of B ) and a random subspace (the span of the rest of 
the columns). We then prove a lower bound for this distance. 

5.3.1. Negative second moment identity. The negative second moment identity [34, Lemma A.4] 
expresses the Hilbert-Schmidt norm of the pseudo-inverse of B as follows: 

n n 

J2 s i(B)- 2 = Yl dist(^,^r 2 

j= 1 i =1 

where Sj(B) denote the singular values of B, Bj denote the columns of B, and Hj = span (Bk)k^j- 
To bound the sum above, we will establish a lower bound on the distance between the random 
vector Bj € cd 1+£ ) n and random subspace Hj C C^ 1+£ ^ n of complex dimension n — 1. 

5.3.2. Enforcing independence of vectors and subspaces. Let us fix j. If all entries of B are inde¬ 
pendent, then Bj and Hj are independent. However, Assumption 1.1 leaves a possibility for Bj to 
be correlated with j-th row of B. This means that Bj and Hj may be dependent, which would 
complicate the distance computation. 

There is a simple way to remove the dependence by projecting out the j-th coordinate. Namely, 
let Bj € C (1+£)n_1 denote the vector Bj with j-th coordinate removed, and let H'- = span (B' k )k^j. 
We note the two key facts. First, B'- and H'- are independent by Assumption 1.1. Second, 

dist (Bj,Hj) > dist(Bj, Hj), (5-3) 

since the distance between two vectors can only decrease after removing a coordinate. 
Summarizing, we have 

n n 

> ^dist(H',^.)" 2 - (5-4) 

j= 1 i =1 

Recall that B) € (£(i+e)n-i is a random vector with independent entries whose real parts have 
densities bounded by K (by Assumptions 1.1 and 1.2); and H) is an independent subspace of 
(£(i+£)n-i Q f C onrplex dimension n — 1 . 

We are looking for a lower bound for the distances dist (Bj,Hj). It is convenient to represent 
them via the orthogonal projection of B) onto (Hj)-*-: 

dist(Bj,Hj) = \\P Ej Bj\\2, where E j = (H , j ) ± . (5.5) 
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5.3.3. Transferring the problem from C to R. We will now transfer the distance problem from the 
complex to the real field. To this end, we define the operation z H > z that makes complex vectors 
real in the obvious way: 

for z = x + iy G C N , define z = G R 2jV . 

Similarly, we can make a complex subspace E C C N real by defining 

E = {z : z G E} C R 2N . 

Note that this operation doubles the dimension of E. 

Let us record two properties that follow straight from this definition. 

Lemma 5.2 (Elementary properties of operation x H > x). 1. For a complex subspace E and a vec¬ 
tor z, one has 

P E z = P^z. 

2. For a complex-valued random vector X and r > 0, one has 

C(X,r) = C(X,r). 

Recall that the second part of this lemma is about the concentration function C(X,r) we intro¬ 
duced in Section 3. 

After applying the operation z z to the random vector B l - in (5.4), we encounter a problem. 

Since the imaginary part of B'- is fixed by Assumption 1.1, only half of the coordinates of B r - 
will be random, and that will not be enough for us. The following lemma solves this problem by 
randomizing all coordinates. 

Lemma 5.3 (Randomizing all coordinates). Consider a random vector Z = X + iY G C N whose 
imaginary part Y G is fixed. Set Z = (^)) € K 2iV where X\ and X 2 are independent copies of 
X. Let E be a subspace of C N . Then 

C{P E Z , r) < C(P E Z, 2r) 1 /2, r > 0 . 

Proof. Recalling the definition of the concentration function, in order to bound C(P E Z, r ) we need 
to choose arbitrary a G and find a uniform bound on the probability 

P := P{|| P E Z - a|| 2 < r} . 

By assumption, the random vector Z = X + iY has fixed imaginary part Y. So it is convenient to 
express the probability as 

P = R {\\P E X — b\\ 2 < r} 

where b = a — P E {iY) is fixed. Let us rewrite this identity using independent copies X\ and X 2 of 
X as follows: 

p = F {\\PEXr - b\\ 2 < r} = P{|| P E (iX 2 ) - ib\\ 2 < r} . 

(The last equality follows trivially by multiplying by i inside the norm.) By independence of X\ 
and X 2 and using triangle inequality, we obtain 

P 2 = P {|| P E X 1 - b\\ 2 < r and || P E (iX 2 ) - ib\\ 2 < r} 

< P {|| P E (X 1 + iX 2 ) - b - ib\\ 2 < 2r} 

KCiPE^+iXi^r). 

Further, using part 2 and then part 1 of Lemma 5.2, we see that 

C(P E (X 1 + iX 2 ), 2 r) = C(P E {XT+Tx 2 ),2r) = C(P E Z, 2r). 
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Thus we showed that p 2 < £(PgZ, 2r) uniformly in a. By definition of the concentration function, 
this completes the proof. □ 


5.3.4. Bounding the distances below. We are ready to control the distances appearing in (5.5). 


Lemma 5.4 (Distance between random vectors and subspaces). For every j £ [n] and t > 0, we 
have 

P {dist(-£>'•, Hj) < r^/in} < [CKr) en . (5.6) 

Proof. Representing the distances via projections of B'- onto the subspaces Ej = as in (5.5), 

and using the definition of the concentration function, we have 

Pj := P {dist (BpHj) < Ty/en) < C{Pe :i B' 3 , Tyfen). 

Recall that B'- and Ej are independent, and let us condition on Ej. Lemma 5.3 implies that 

Pj < T\/en ) 1/2 


where Z is a random vector with independent coordinates that have densities bounded by K. 

Recall that H'- has codimension era; thus Ej has dimension era and Ej has dimension 2era. We 
can use a bound on the small ball probability from [27], which states that the density of PjjrZ is 

bounded by ( CK) 2en . Integrating the density over a ball of radius 2 Tyfen in the subspace Ej that 
has volume (C'r) 2e?l , we conclude that 


C(PrrZ, Ty/en) < (CAT) 26 ". 

It follows that 

Pj < ( CKrf n , 

as claimed. The proof of Lemma 5.4 is complete. 


□ 


5.4. B is bounded below on a large subspace E+. 

5.4.1. Plugging the distance bound into second moment inequality. In order to substitute the bound 
(5.6) into the negative second moment inequality (5.4), let us recall some classical facts about the 
weak L p norms. The weak L p norm of a random variable Y is defined as 


||*1lp,oo = sup t- (P {|T| > t}) 1/p . 

t > 0 

This is not a norm but is equivalent to a norm if p > 1. In particular, the weak triangle inequality 
holds: 

II J2 Yi Up. °°^ c (rtSlML* ( 5 - 7 ) 

i i 

where C(p) is bounded above by an absolute constant for p > 2 , see [29], Theorem 3.21. 

The bound (5.6) means that Y t := dist(-£>*, are in weak L p for p = era/2, and that ||Yj||p i0 o < 

C 2 K 2 /en. Since by assumption p > 2, the weak triangle inequality (5.7) yields || ^IIp,oo < 
C'lTC 2 /^. This in turn means that 

P |^dist (B^Hi)- 2 > -J-| < (C 2 Jir) £n , r > 0. 

Therefore, by the second negative moment identity (5.4), the event 


Si : = 


^2si(B) 2 

1=1 



(5.8) 


is likely: P((£i) c ) < ( C 2 K T ) en 
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5.4.2. A large subspace E + on which B is bounded below. Fix a parameter r > 0 for now, and 
assume that the event (5.8) occurs. By Markov’s inequality, for any 5 > 0 we have 


{i : Si(B) < 5y/n) = {i : Si{B ) 2 > -^} 

Let c € (0,1) be a small absolute constant. Choosing 6 = ere, we have 


5 2 


. n 

5 TV 


{i : Si(B ) < CT£\fn } 


< cen. 


(5.9) 


Let Vi(B ) be the right singular vectors of B, and consider the (random) orthogonal decomposition 
C n = E~ + E + , where 


E = span{u,(B) : Si(B) < crey/n}, E + = span{uj(i?) : Si(B) > cre^/n}. 


Inequality (5.9) means that dimc(E ) < cen. 

Let us summarize. We obtained that the event 


V E - := {dim(£' ) < cen} satisfies F((V E -) C ) < (C^ATt) 6 ’™, (5.10) 

so E~ is likely to be a small subspace and E + a large subspace. Moreover, by definition, B is nicely 
bounded below on E + : 

inf ||Bx || 2 > CTE\fn. (5.11) 

x£S e + 


5.5. G is bounded below on the small complementary subspace E~. Recall that the sub¬ 
spaces E + and E~ are determined by the sub-matrix B, so these subspaces are independent of G 
by Assumption 1.1. Let us fix B so that dim(£ ,_ ) < cen; recall this is a likely event by (5.10). 

Note that G is an en x n random matrix with independent entries. We are going to show that G 
is well bounded below when restricted onto the fixed subspace E~. This can be done by a standard 
covering argument, where a lower bound is first proved for a fixed vector, then extended to a d-net 
of the sphere by a union bound, and finally to the whole sphere by approximation. 


5.5.1. Lower bounds on a fixed vector. 

Lemma 5.5 (Lower bound for a fixed row and vector). Let Gj denote the j-th row of G. Then for 
each j, z € S^T 1 , and 6 > 0 , we have 

P {| {Gj, z)\<6}< C 0 K9. (5.12) 

Proof. Fix j and consider the random vector Z = Gj. Expressing Z and z in terms of their real 
and imaginary parts as 

Z = X + iY, z = x + iy, 

we can write the inner product as 

(Z, z) = [{X, x) - (Y, y)} + i [{X , y) + (Y, x)} . 

Since z is a unit vector, either x or y has norm at least 1/2. Assume without loss of generality 
that || x ||2 > 1/2. Dropping the imaginary part, we obtain 

I (Z, z) | > \{X,x)-{Y,y)\. 

Recall that the imaginary part Y is fixed by Assumption 1.1. Thus 

F{\{Z,z)\<e}<jr((X,x),9). 

We can express {X, x) in terms of the coordinates of X and x as the sum 

n 

{x,x) = x k x k . 

k =1 
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(5.13) 








Since X & are the real parts of independent entries of G, Assumptions 1.1 and 1.2 imply that X & 
are independent random variables with densities bounded by K. Recalling that Y^l=\ x \ — 1/2, 
we can apply a known result about the densities of sums of independent random variables, see [27]. 
It states that the density of Ylk=i Xk%k is bounded by CK. It follows that 

C((X, x) ,9) < CKO. (5.14) 

Substituting this into (5.13) completes the proof of Lemma 5.5. □ 

Lemma 5.6 (Lower bound for a fixed vector). For each x € S^ -1 and 9 > 0, we have 

P {\\Gx \\ 2 < 0\/ira} < ( C o K0Y n . 

Proof. We can represent 11 11 § as a sum of independent random variables I (Gj, x ) I 2 - Each °f 

the terms (Gj,x) satisfies (5.12). Then the conclusion follows from Tensorization Lemma 3.3. □ 

5.5.2. Lower bound on a subspace. 

Lemma 5.7 (Lower bound on a subspace). Let M > 1 and n € (0,1). Let E be a fixed subspace 
of C n of dimension at most fien. Then, for every 9 > 0, we have 


inf ||Gx || 2 < Oy/en and Bq,m f < \CK(M/y/e) 2fJj 9 1 2ai V 
xgSe ’ 


(5.15) 


Proof. Let <5 € (0,1) to be chosen later. Since the dimension of E C M 2n is at most 2 fj,en, standard 
volume considerations imply the existence of a 5-net J\f C Se with 




3 \ 

!) 


(5.16) 


Assume that the event in the left hand side of (5.15) occurs. Choose x € Se such that ||Gx ||2 < 
9yfen. Next, choose xo € J\f such that ||x — X 0 II 2 < 5. By triangle inequality and using Bq,Mi we 
obtain 

||Gs 0 ||2 < ||Gs || 2 + ||G|| ■ ||x — X 0 II 2 < 9yfen + Myfn ■ 5 

Choosing 5 := Oy/e/M, we conclude that ||G*£CoII 2 < 2 9y/en. 

Summarizing, we obtained that the probability of the event in the left hand side of (5.15) is 
bounded by 

P {3xo € Af : ||G*o|| 2 <26V^}- 
By Lemma 5.6 and a union bound, this in turn is bounded by 

|AC| • (2 C 0 K9) £n < ( ^ 


\9y/e 


(2 C 0 K9) £ 


where we used (5.16) and our choice of 5. Rearranging the terms completes the proof. 


□ 


5.5.3. Conclusion: G is bounded below on a large subspace E~. We can apply Lemma 5.7 for the 
subspace E = E~ constructed in Section 5.4.2. We do this conditionally on B, for a fixed choice 
of E~ that satisfies T>e~, thus for p < c < 0.05. This yields the following. 


Lemma 5.8 (G is bounded below on E ). For every 9 > 0, we have 

P ( inf ||Gs || 2 < 9y/Ifi and V E - and B G M \ < (CKM 01 £-° m 9 0 - 9 y 
[xeS E - ) 

Recall that T>e~ likely event defined in (5.10). 

5.6. Proof of invertibility. 


□ 
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5.6.1. Decomposing invertibility. The following lemma reduces invertibility of A to invertibility of 
B on E + and G on E~. 


Lemma 5.9 (Decomposition). Let A be an m x n matrix. Let us decompose A as 


A = 


B 

G 


, B eC miXn , G €C m2Xn , m = m 1 +m 2 . 


Consider the orthogonal decomposition C n = E + E + where E and E + are eigenspaces 2 of B*B. 
Denote 

SA = 'S m in(^4) ) SB = 'S m i n (-B|i<;+)) SQ = S m i n (G , |i?—)• 

SBSG 


SA > 


4PII 


(5.17) 


Proof. Let x € S n ~ l . We consider the orthogonal decomposition 

x = x~ + x + , x~ € E~ , x + G E + . 

We can also decompose Ax as 

ll^xlli = \\ Bx \\l + \\ Gx \\l- 

Let us fix a parameter A € (0, 1 / 2 ) and consider two cases. 

Case 1: ||x + ||2 > A. Then 

\\Ax\\2 > \\Bx\\2 > ||i?X + ||2 > SB ■ A. 

Case 2: ||a : + ||2 < A. In this case, ||aT ||2 = y'T^][x+]j|' > 1/2. Thus 
\\Ax \\2 > \\Gx \\2 > ||Gx _ || 2 - ||Gx + || 2 

>||G*-||2-||GH|s + || 2 > aG ^-||G]|-A. 

Using that ||G|| < ||A||, we conclude that 

sa = inf \\Ax\\ 2 > min (s B ■ A, s G ■ - - \\A\\ ■ x). 
Optimizing the parameter A, we conclude that 


■sa > 


sbsg 


2 (sb + ||^4||) 

Using that sb bounded by ||^4||, we complete the proof. 


□ 


5.6.2. Proof of the Invertibility Theorem 5.1. We apply Lemma 5.9 for the matrix A and the 
decomposition (5.2) and obtain 

sbsg < 4 P||sa- 

Since A is a sub-matrix of A, we have ||A|| < My/n on the event £>a,m• Further, (5.11) yields the 
bound sb > CT£y/n. It follows that 

AMt 

CT£ 

AMt 


{sa < tyfn and Ba,m } < P js G < — • y/n and Ba,m 

fen and V E - and Ba, m| + F((V E -) C ). (5.18) 


< P i s G < 


2 In other words, E and E + are the spans of two disjoint subsets of right singular vectors of B. 
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The last line prepared us for an application of Lemma 5.8. Using this lemma along with the trivial 
inclusion Ba,m U Bq,m and the estimate (5.10) of the probability of ( V E -) C , we bound the quantity 
in (5.18) by 


CKM 01 £-°- 05 


4 Mt 

CTE 3 / 2 


0.9l £n 

+ ( C 2 Krf n . 


This bound holds for all r, f > 0. Choosing r = y/t and rearranging the terms, we obtain 
P {s A < ty/ri and B a ,m} < [CKM£- 1A t 0A5 ] en + ( CKt°- 5 ) £n . 


This implies the desired conclusion (5.1). Theorem 5.1 is proved. 


□ 


6. INVERTIBILITY FOR GENERAL DISTRIBUTIONS: STATEMENT OF THE RESULT 

We are now passing to random matrices whose entries may have general, possibly discrete dis¬ 
tributions; our goal being Delocalization Theorem 1.5. Recall that in Section 2.1 we described on 
the informal level how delocalization can be reduced to invertibility of random matrices; Propo¬ 
sition 4.1 formalizes this reduction. This prepares us to state an invertibility result for general 
random matrices, whose proof will occupy the rest of this paper. 

Theorem 6.1 (Invertibility: general distributions). Let A be an n x n random, matrix satisfying 
the assumptions of Theorem 1.5. Let M > 1, £ € (1/n, c), and let I C [n] be any fixed subset with 
|/| = £n. Then for any 

t>— + e~ c ^, ( 6 . 1 ) 

era 

we have 

P {smin(Ara) < ty/n and B a ,m} < {Ct 0A e~ 1A ) en ^ 2 . 

The constant C in the inequality above depends on M and the parameters p and K appearing 
in Assumption 1.4. 

Delocalization Theorem 1.5 follows from Theorem 6.1 along the same lines as in Section 5.1. As in 
that section, for a given s we set t = 8M(es) 6 , which leads to the particular form of the restriction 
on s in Theorem 1.5. This restriction, as well as the probability estimate, can be improved by 
tweaking various parameters throughout the proof of Theorem 6.1. They can be further and more 
significantly improved by taking into account the arithmetic structure in the small ball probability 
estimate, instead of disregarding it in Section 10. We refrained from pursuing these improvements 
in order to avoid overburdening the paper with technical calculations. 


7. Small ball probabilities via least common denominator 

In this section, which may have an independent interest, we relate the sums of independent 
random variables and random vectors to the arithmetic structure of their coefficients. 

To see the relevance of this topic to invertibility of random matrices, we could try to extend 
the argument we gave Section 5 to general distributions. Most of the argument would go through. 
However, a major difficulty occurs when we try to estimate the distance between a random vector 
X and a fixed subspace H. For discrete distributions, dist (X,H) = can no longer be 

bounded below as easily as we did in Lemma 5.4. The source of difficulty can be best seen if we 
consider the simple example where H is the hyperplane orthogonal to the vector (1,1,0,0 ,... ,0) 
and X is the random Bernoulli vector (whose coefficients are independent and take values 1 and — 1 
with probability each). In this case, dist (X,H) with probability 1/2. Even if we exclude zeros by 
making H orthogonal to (1,1,1,1,..., 1), the distance would equal zero with probability w™, 
thus polynomially rather than exponentially fast in ra. 

The problem with these examples is that H 1 - had rigid arithmetic structure. In Section 7.1, 
we will show how to quantify arithmetic structure with a notion of approximate least common 
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denominator (LCD). In Section 7.2 will also provide bounds on sums of independent random vectors 
in terms of LCD. Finally, in Sections 7.3 and 7.4 we will specialize these bounds for sums of 
independent random variables and projections of random vectors (and in particular, for distances 
to subspaces). 

7.1. The least common denominator. An approximate concept of least common denominator 
(LCD) was proposed in [23] to quantify the arithmetic structure of vectors; this idea was developed 
in [25, 24, 38], see also [26]. Here we will use the version of LCD from [38]. We emphasize that 
throughout this section we consider real vectors and matrices. 

Definition 7.1 (Least common denominator). Fix L > 0. For a vector v € M. N , the least common 
denominator (LCD) is defined as 


D{v) = D(y,L ) = inf <^0 > 0 : dist(0r>, 7j N ) < L 
For a matrix V € M. mxN , the least common denominator is defined as 


D(V) = D(V, L) = inf 1||0|| 2 : d € K m , dist(P T 0, Z N ) < L 

Remark 7.2. The definition of LCD for vectors is a special case of the definition for matrices with 
m = 1. This can be seen by considering a vector v € K iV asalxiV matrix. 

Remark 7.3. In applications, we will typically choose L ~ y/m, so for vectors we usually choose 

L ~ 1. 




Before relating the concept LCD to small ball probabilities, let us pause to note a simple but 
useful lower bound for LCD. To state it, for a given matrix V we let ||Vj|oo denote the maximum 
Euclidean norm of the columns of V. Note that for vectors (1 x N matrices), this quantity is the 
usual £oo norm. 


Proposition 7.4 (Simple lower bound for LCD). For every matrix V and L > 0, one has 

D(V,L)> 1 


Proof. By definition of LCD, it is enough to show that for 0 € M m , the inequality 


dist (V'6X ) < L \/ log 


\\v J 0b 

L 


(7.1) 


implies ||0|| 2 > 1/(2111/1100). Assume the contrary, that there exists 6 which satisfies (7.1) but for 
which 

' ”'1 2 < TrTTrTn ■ (7-2) 


We can use Cauchy-Schwartz inequality and (7.2) to bound all coordinates ( Vj,9 ) of the vector 
V J 9 as follows: 

\(V j , 9)\<\\V\\ 00 \\9\\ 2 <^ j = 

(Here Vj € M m denote the columns of the matrix V.) This bound means that each coordinate of 
V J 9 is closer to zero than to any other integer. Thus the vector V J 9 itself is closer (in the £2 norm) 
to the origin than to any other integer vector in Z N . This implies that 

dist(P T 0,Z JV ) = ||W t 6 »|| 2 . 

17 










Substituting this into (7.1) and dividing both sides by L, we obtain 

u < y/log~+u where u = ||D T (9||2/-L. 

But this inequality has no solutions for u > 0. This contradiction completes the proof. 


□ 


7.2. Small ball probabilities via LCD. The following theorem relates small ball probabilities to 
arithmetic structure, which is measured by LCD. It is a general version of results from [23, 25, 38]. 

Theorem 7.5 (Small ball probabilities via LCD). Consider a random vector f = (£i,..., ^n), 
where f k are i.i.d. copies of a real-valued random variable £ satisfying (1.2). Consider a matrix 


V € W nxN . Then for every L > y/8m/p we have 

c(vz,tVm) < f* 


+ 


■V 


det(DD T ) 1 / 2 V ^ D(V)J ’ 


t > 0. 


(7.3) 


Proof. We shall apply Esseen’s inequality for the small ball probabilities of a general random vector 
Y £ M m . It states that 


C{Y, yfa) <C m [ 

JBt 


\ 4 >y( 8 )\ dd 


(7.4) 


' B(0,yTi) 

where fy (#) = Eexp(27ri ( 9 , Y)) is the characteristic function of Y and B( 0, \fm) is the Euclidean 
ball centered at the origin and with radius yfm. 

Let us apply Esseen’s inequality for Y = t~ 1 Vf, assuming without loss of generality that t > 0. 
Denoting the columns of V by 14, we express 

N 

(e,Y) = '£t~ 1 (9,v k )f k . 

k= 1 

By independence of £, this yields 

N 

4 >y{ 0 ) = ( 0 , 14)), where (j) k {r) = Eexp(27r ir£ k ) 

k=\ 

are the characteristic functions of f k . Therefore, Esseen’s inequality (7.4) yields 

N 


(7.5) 


T(Vf.TVm) = £(Y, \frri.) < C m [ TT ^(r 1 (9,V k )) \ d0. 

JB(0,y/m) k=1 

Now we evaluate the characteristic functions that appear in this integral. First we apply a 
standard symmetrization argument. Let f be an independent copy of £, and consider the random 
vector £:=£ — £'. Its coordinates f k are i.i.d. random variables with symmetric distribution. It 
follows that 

\ 4 > k ( T )\ 2 = Eexp(27r*r^i) = Ecos(27tt£i) for all k . 

Using the inequality x < exp(—1(1 — x 2 )) that is valid for all x > 0, we obtain 


1 


|</>fcCr)|<exp --E [l- cos(27tt£i)] 


(7.6) 


The assumptions on f k imply that the event {1 < |£i| < K} holds with probability at least p/2. 
Denoting by E the conditional expectation on that event, we obtain 


E [l — cos(27tt£i)] > t^E[1 — cos(27tt£i)] > 
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? ■Emin|r^i-g| 2 . 

Z q€Z 


(7.7) 








Substituting this inequality into (7.6) and then back into (7.5), we further derive 

N 


C(V^,ty/m) < C m / exp 

J B(0,y/m) 


-EV'min|t (0, V k ) - q k \ 

1 Qk 6Z 


do 


k =1 


= C m / exp ( - f/(0) 2 ) 

J B(0,y/rn) ^ 


(7.8) 


where 


/(0) 2 := E min 


t~%v T e - q 


= E dist(t _1 £iU 1 0,Z JV ) 2 . 


1V\2 


The least common denominator D(V, L ) will help us estimate the distance to the integer lattice 
that appears in the definition of f{8). Let us first assume that 

2Ky/m 


t>t 0 : = 


(7.9) 


D(V,LY 

or equivalently that D(V,L) > 2K^/rn/t. Then for any 8 appearing in the integral (7.8), that is 
for 8 £ B(0, y/rn), one has 

< D(V). 

(Here we used that |£i| < K holds on the event over which the conditional expectation E is taken.) 
By the definition of D(V), this implies that 


dist(H T (t- 1 60),Z 7V ) > L\l log + 


N\ 


L 


Recalling the definition of / and using that |£i| > 1 on the event over which the conditional 
expectation E is taken, we obtain 

f(8) 2 >L 2 l og+ ^ik. 

where in the second inequality we use that |£i| > 1 on the event over which the conditional 
expectation E is taken. Substituting this bound into (7.8), we obtain 

pL\_ \\V J 8\\, 


£(V£,ty/m) <C m [ 
Jb( 


exp 


■log. 


Lt 


d8. 


One can estimate this integral in a standard way. 

Let us get rid of V in the integrand by an appropriate change of variable. Using a singular value 
decomposition of V, one can replace U T # by T,8 where £ € M mxm is a diagonal matrix with singular 
values of V on the diagonal. Next, we change variables to Yid/Lt = z. Since det£ = det(UU T ) 1//2 , 
this yields 

(< CLt) m f ( pL 2 


- det(UUT )V 2 L 6XP ( ~ P ~T [ ° g+ M2 ) d " 


det(UU T ) 1 / 2 

We evaluate the integral by breaking it into two parts: 

pL 2 


[ exp f — log + ||z|| 2 ) dz = I 1 dz + ( 

Jr™- V 4 / J B ( o,i) Jb i 


|z|| 2 pL ^ dz. 


(7.10) 


(7.11) 


<B( 0 ,1) J B(p,l) c 

Let us start with the second integral. Passing to the polar coordinates (r, 0) € M + x S m ~ 1 where 
dz = r m ~ l dr d<j), we obtain for any q > 0 that 


L 


B(o,iy 


\z\\ 2 q dz = 


f dr \ 

J i J s ™- 1 


r -q r m-l __ 


dcj) — u m —i 


re 

(gm- 1 ) j 


r m—q—l dr? 
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where a m -i(S m 1 ) is the surface area of the unit sphere. Recall that 

2vr m / 2 / C 


and that 


This yields 


s; 

L 




r m-q- 1 dr = 


T{m/2) 


< 


1 


1-9 


dz < 


q — m 
C 


< 1 for q > 2m. 


for q > 2m. 


>B( 0,l) c 

We use this bound for q = pL 2 /4, where q > 2m by assumption. It follows that the integral over 
-B(0, l) c in (7.11) is bounded by ( C / y/m) m . Moreover, the integral over B( 0,1) equals the volume 
of the unit ball B( 0,1), which is also bounded by (Cj\fm) m . Thus the right hand side of (7.11) is 
bounded by 2 (C/y/m) m . Substituting it into (7.10), we obtain 

<™> 

This completes the proof in the case where t >to as specified in (7.9). 

In the opposite case where t < to, it is enough to use that £(V£, ty/rn) < £(V£, t$y/rn) and apply 
the inequality (7.12) for to- This completes the proof of Theorem 7.5. □ 


7.3. Special cases: sums of independent random variables. Let us state an immediate 
consequence of Theorem 7.5 in the important special case where m = 1. In this case, V £ becomes 
a sum of independent random variables. 


Corollary 7.6 (Small ball probabilities for sums). Let be i.i.d. copies of a real-valued random 
variable £ satisfying (1.2). Let a = (ai,..., ajv) € M n . Then for every L > yJS/p we have 

N 


£[^2 a k€k,t) < 


k=1 


CL 


t + 


1 


a|| 2 V D(a,L ) 


t > 0. 


This corollary was proved in [38]; similar versions appeared in [23, 25]. It gives a non-trivial 
probability bound when the coefficient vector a is sufficiently unstructured, i.e. when D(a, L) is 
large enough. In the situation where no information is known about the structure of a, the following 
result can be useful. 


Lemma 7.7 (Small ball probabilities: a simple bound). Let £*. be independent random variables 
satisfying (1.2), and let aj be real numbers such that Ylj=i a j = 1- Then 

N 

£(X/hfc£fc,c) < 1 -d 

k =1 

where c and d are positive numbers that may only depend on p and K. 


Proof. We will consider separately the cases where a has a large coordinate and where it 
Assume first that 


OO — 


1 

ACL 


does not. 


where C is the constant appearing in Corollary 7.6. Choose a coordinate ko such that |afc 0 | = Halloo 
Applying Lemma 3.2, we obtain 

N 

< T.(a ko f,k 0 ,^j < £(6c 0 ,l) < 1 -P- 

k= 1 
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In the opposite case where Halloo < z/, Proposition 7.4 implies D(a,L) > \/(2v). Combining this 
with Corollary 7.6, we obtain 

N 3 

C (^2 a 3^ v ) <CL-3v<~, 

3 =1 

which completes the proof. □ 


7.4. Special cases: projections of random vectors. Another class of examples where Theo¬ 
rem 7.5 is useful is for projections of a random vector £ onto a fixed subspace E of M N . Equivalently, 
this result allows us to estimate the distances between random vectors and fixed subspaces, since 
dist (X,H) = H-Pfl-xAll^. 

To deduce such estimates, we will make the matrix V Theorem 7.5 encode an orthogonal pro¬ 
jection onto E. Let us pause to interpret the LCD of such matrix V as the LCD of the subspace E 
itself. 


Definition 7.8 (LCD of a subspace). Fix L > 0. For a subspace E C R^, the least common 
denominator is defined as 

D(E) = D(E, L) = mi{D(v,L) : v G S E }. 

By now, we have defined LCD of vectors, matrices, and subspaces. The following lemma relates 
them together. 

Lemma 7.9 (LCD of subspaces vs. matrices). Let E be a subspace of R N . Then 

(1) D{E) = inf |||x||2 : x G E, dist(x,Z Ar ) < L\J\og + • 

(2) Let U G R Nxm be a matrix such that U T U = I m and Im(E7) = E. Then D(E) = D(U J ). 


Proof. The first part follows directly from the definition. To prove the second part, note that 
according to Definition 7.1 we have 

D(U J ) = inf 1||0|| 2 : 0 € R m , dist (170, Z N ) < L 

Let us change variable to x = U6. The assumptions on U imply that ||ac|| 2 = ||0||2 and as 6 runs 
over R m , x runs over Im(t/) = E. We finish by applying the first part of this lemma. □ 



The following corollary is a version of a result from [25]. 


Corollary 7.10 (Small ball probabilities for projections). Consider a random vector f = (£i,..., ^n), 
where fk are i.i.d. copies of a real-valued random variable f satisfying (1.2). Let E be a subspace 
of R jV with dim(i7) = m, and let P E denote the orthogonal projection onto E. Then for every 
L > 8m/p we have 


C{P E i,t\frn) < 


CL \ m t yjrh’ 
7 m) V + D{E,L) 


m 


t > o. 


(7.13) 


Proof. Choose a matrix U G R Nxm so that U T U = I m and UU T = P E . Then U acts as an isometric 
embedding from R m into R A , i.e. ||Cx||2 = ||x||2 for all x G R m . This yields 

C(P E t > ,t\frn) = C(UU J ffiy/m) = C(U J ffiy/m). 

We apply Theorem 7.5 for V = U J and note that det(VW T ) = det(t/ T t/) = det (I m ) = 1. Thus 
C(P E £',ty/m) gets bounded by the same quantity as in the right hand side of (7.13) except for 
D(V). It remains to use Lemma 7.9, which yields D(V) = D(U T ) = D(E). □ 
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8. Distances between random vectors and subspaces: statement of the result 


Our next goal is to prove a lower bound for the distance between independent random vectors and 
subspaces. For continuous distributions, this was achieved in Lemma 5.4. Doing this for general, 
possibly discrete, distributions, is considerably more difficult. The following result is a version of 
Lemma 5.4 for general distributions. 


Theorem 8.1 (Distance between random vectors and subspaces). Let H G h e a random 

matrix which satisfies Assumptions 1.1? and 1-4, and assume that n = (1 — e)N for some e G 
(2/n, c). Let Z G C N be a random vector independent of H, and whose coordinates are i.i.d. 
random variables satisfying the same distributional assumptions as specified in Assumption 1.4- 
Then 

-ieN 


P jdist(Z, Im(iL)) < tVsN and Bh,m^ < C^r + ^ + e 


r > 0. 


A version of this theorem was proved in [25] in the simpler situation where the entries of H are 
real-valued and all independent. In this simpler case, [25] gives the following optimal bound: 

P |dist(Z, Im(iL)) < rVdv} < (Cr) e7V + e~ cN . 

We do not know if the same bound can be proved in the setting of Theorem 8.1. 


To prove Theorem 8.1, we will first reduce it to a problem over reals - much like we did in 
Section 5.3.3. Then, expressing the distance dist(Z, Irn(iL)) as the norm of the projection of Z 
onto In^iL)- 1 = ker(LP), we should be able to apply Corollary 7.10. However, for the resulting 
probability bound (7.13) to be meaningful, we would need to show that the least common denomi¬ 
nator D(ker(H*), L) is large, or in other words, that the subspace H is unstructured. This will be 
a major step in the argument. Eventually we will achieve this in Section 12, which will allow us to 
quickly finalize the proof of Theorem 8.1. 

In preparation for the proof of Theorem 8.1, let us express the distance we need to estimate as 
follows: 

dist(Z,Im(L0) = \\P lm{H) ±Z\\ 2 = \\P ker{B) Z\\ 2 , where B = H* G C nxN . (8.1) 

Our goal is to show that ker(H) is arithmetically unstructured. 


8.1. Transferring the problem from C to M. Similarly to our argument for continuous dis¬ 
tributions, we will now transfer the distance problem from the complex to the real held. In Sec¬ 
tion 5.3.3, we introduced the operation 2 it i that makes a complex vector z = x + iy in C N real 
by defining P := Q) G . We also introduced this operation for subspaces E of C N by defining 

E = {z : z G E} C R 2N . 

In the analysis of the distance problem for continuous distributions, we did not need to know 
anything about the subspaces H'j beyond their dimensions. This time, our analysis will be sensitive 
to the structure of the subspace ker(H). For this purpose, we will need to transfer the matrix B 
from complex to real held. We can do this in a way that preserves matrix-vector multiplication as 
follows: 


For B = R + iT G C nxN , dehne B 


R 

T 


-T 

R 


^ j^2nx2AT 


( 8 . 2 ) 


We already observed two elementary properties of the operation 2 A 2 in Lemma 5.2; let us 
record one more straightforward fact. 


“^Assumption 1.1 is formulated for square random matrices. For rectangular matrices, one of the entries Aij or Aji 
may not exist. In this case, we assume that the other enty is independent of the rest. 
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Lemma 8.2 (Elementary property of operation x i-> x). For a complex matrix B and a vector z, 
one has Bz = Bz, and consequently ker(R) = ker(R). 

Let us return to the distance problem (8.1). Applying Lemma 5.3 for E = ker B, we conclude 
that 

C(P t „ {B) Z, r) < C(P~ } Z, 2r) 1 A 

Using the interpretation of distance as norm of projection in (8.1), we can summarize the first step 
toward the proof of Theorem 8.1. We showed that 

P |dist(Z, Irn(R)) < tVIn} < C(P~^ Z,2T\feN) 1/2 . (8.3) 

Recall that here, according to Lemma 8.2, kerR = kerR, where R is the random matrix from 
(8.2) and Z £ R 2N is a random vector. Specifically, T is a fixed n x N matrix and R is an n x N 
random matrix, which satisfies the structural and distributional requirements of Assumptions 1.1 
and 1.4 (except that R entirely real). The coordinates of the random vector Z are i.i.d. copies of 
a real random variable £ satisfying (1.2). 

9. Kernels of random matrices are incompressible 

9.1. Compressible and incompressible vectors. Before we can show that the kernel of R 
consists of arithmetically unstructured vectors, we will prove a much simpler result. It states that 
the kernel of R consists of incompressible vectors - those whose mass is not concentrated on a small 
number of coordinates. The partition of the space into compressible and incompressible vectors 
has been instrumental in arguments leading to invertibility random matrices, see [23, 25, 38]. 

Definition 9.1 (Compressible and incompressible vectors). Let co,ci € (0,1) be two constants. A 
vector z € is called sparse if \ supp(z) | < coN. A vector z € 5^ 1 is called compressible if 
x is within Euclidean distance c\ from the set of all sparse vectors. A vector z € is called 

incompressible if it is not compressible. The sets of compressible and incompressible vectors in 
S^ 1 will be denoted by Comp and Incomp respectively. 

The definition above depends on the choice of the constants co, ci. These constants will be chosen 
in Proposition 9.4 and remain fixed throughout the paper. 

As we already announced, our goal in this section is to prove that, with high probability, the 
kernel of R consists entirely of incompressible vectors. We will deduce this by providing a uniform 
lower bound for |JRjzr ||2 for all compressible vectors z. 

9.2. Relating ||Rz ||2 to a sum of independent random variables. Let us fix a vector z for 
now. We would like to rexpress ||Rz ||2 to a sum of independent random variables, and then to use 
bounds on small ball probabilities from Section 7. Using the real version R of the matrix R, and 
the real version z of the vector z we introduced in Section 8.1, we can write 

\\Bz\\l = || Bz\\l = || Rx + Ty\\l + || Ry - Txg. (9.1) 

Let us fix a subset J C [N]. Dropping the coefficients of the vectors Rx + Ty and Ry — Tx indexed 
by J, we obtain 

\\Bz\\l > \\Rjc x[n] x + a||jj + \\Rjo x[n] y - 6|||, 

where a = Tjc x y and b = Tjc x x are fixed vectors. 

Further, let us decompose Rjc x [ n j = Rj= x j + Rj c xJ c , where Rjc x i denotes the matrix Rjc x [ n ] 
whose columns which does not belong to I are replaced by zeros. Assumption 1.1 implies that these 
two components are independent, and moreover the first one, Rjc x j, has independent entries. So 
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let us condition on the second component, Rjc x j c • Absorbing its contribution into a and b, we 
obtain 

^ \\Rj c xjx + o !\ll + || Rj^xjy — ^ 11 2 ) 

where a' and b' are fixed vectors. Expanding the matrix-vector multiplication, we arrive at the 
bound 

\\Bz\\l> Y, X? + Y?, (9.2) 

ie[n]\J 

where 

Xi = Y R ij x i + a 'j, Y = Y R% YR ~ b 'j (9-3) 

i&J i&J 

and a'- and b'j are fixed numbers. The sum in (9.2) should be convenient to control, since all .Re¬ 
appearing in (9.3) are independent random variables. 

9.3. A lower bound on ||R2:||2 for compressible vectors. We start with a simple and general 
lower bound on ||R.z||2 for a fixed vector z. 

Proposition 9.2 (Matrix acting on a fixed vector: simple bound). Let n < N < 2 n, and B € C nxN 
be a random matrix satisfying Assumptions 1.1 and l.f. Then for any fixed vector z € C N with 
\\z \\2 = 1 we have 

P {||Rz || 2 < cy/fi) < e~ cn . 

Proof. Let z = x + iy, and choose J to be the set of indices of the N /4 largest coordinates of 2. 
Since 2 is a unit vector, we have 

\\ xj \\1 + ||2/j||i = 11 112 ^ “j- 

It follows that either xj or yj has norm at least 1/4. Without loss of generality, let us assume that 
||zj||2 > 1/4. 

Dropping the terms Yj from (9.2), we see that 

\\ Bz \\l> Y Xj where Xj = Y, R ij x j + a j- (9-4) 

ie[n]\J 

By Assumption 1.1, Rij are i.i.d. random variables. Moreover, their distribution satisfies Assump¬ 
tion 1.4, so we can apply Lemma 7.7 and conclude that for each j, 

P {|.Xj| < c} < 1 — c'. (9.5) 

Assume that H-B2H2 < ac 2 n where a G (0,1) is a number to be chosen later. By (9.4), this yields 
Ei6[ n ]\j^]' — ac2n i which in turn implies that Xj < c for at least |[n] \ J| — an random variables 
Xj in this sum. Therefore, using independence we obtain 

P {\\Bz\\ 2 2 < ac 2 n } < ■ (1 - c') IWVhan < )“" • (1 - c ') (1 / 2 -“)t ( 9.6) 

The second inequality holds if we choose a small enough so that an < n/ 4, while |[n] \ J\ > 
n — N /4 > n/2 by assumption. The probability bound in (9.6) can be made smaller than e~ cn for 
some c = c(c') > 0 by choosing a = a(c') > 0 sufficiently small. This completes the proof. □ 

We are going to argue that the lower bound in Proposition 9.2 holds not only for a fixed unit 
vector z but also uniformly over z € Comp. This will follow by combining Proposition 9.2 with the 
following standard construction on a net for the set of compressible vectors. 
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Lemma 9.3 (Net for compressible vectors). For any 5 € (0,1), there exists a {2c\)-net of the set 
of Comp of cardinality at most 



Proof. First we construct a ci-net of the set of sparse vectors. This set is a union of coordinate 
subspheres of for all sets J C [N] of cardinality coN. For a fixed J, the standard volume 
argument yields a ci-net of of cardinality at most (C\/cqc\) c ° n . A union bound over ( CqN ) < 
choices of J produces a ci-net of the set of sparse vectors with cardinality at most [C / cqc\) c ° n . By 
approximation, this is automatically a (2ci)-net for the set of compressible vectors. □ 

Proposition 9.4 (A lower bound on the set of compressible vectors). Let B £ C nxAr be a ran¬ 
dom matrix satisfying Assumptions 1.1 and l.f. Then one can choose constants co,ci € (0,1) in 
Definition 9.1 depending on p and K only, and so that 

P | inf \\Bz\\ 2 < cyfn and Bb,m \ < e~ cn . 

IzgComp ’ J 


Proof. Let us choose c\ = c/(4M) and let M be a (2ci)-net of the set Comp given by Lemma 9.3. 

Assume the bad event in Proposition 9.4 occurs, so thus ||£>2:||2 < Cyfn for some 2 € Comp and 
|| B || < My/n. Choose zq £ J\f such that \\z — zq\\ 2 < 2c\. By triangle inequality, we have 

||-BzqH 2 < II-EMI2 + ||-B|| \\z — zo\\ 2 < cyfn + MVN ■ 2ci < 2 cyfn. 


In the last inequality, we used the definition of c\ and the fact that N < 2n. 

Furthermore, Proposition 9.2 states that for fixed zq, the inequality 11 A? ^0112 < 2 Cy/n holds with 
probability at most e~ cn . Combining this with the union bound over zo £ A/" and using the 
cardinality of J\f given by Lemma 9.3, we conclude that the bad event in Proposition 9.4 holds with 
probability at most 


e 


—cn 



Choosing cq so that the last expression does not exceed e c ° n / 2 completes the proof. 


□ 


Proposition 9.4 implies in particular that with high probability the kernel of B consists of in¬ 
compressible vectors: 


ker B n S N 1 C Incomp. 


10. Small ball probabilities via real-imaginary correlations 

Recall that our big goal is to show that the kernel of B is unstructured, which means that all 
vectors in ker(B) have large LCD. We may try to approach this problem using the same general 
line of attack as in Section 9. Namely, we can try to bound H-B2H2 below uniformly on the set of 
vectors with small LCD. 

This will require us to considerably sharpen the tools we developed in Section 9 - small ball 
probabilities and constructions of nets. More precisely, we would like to make the probability in 
Proposition 9.2 exponential in 2n rather than n; an ideal bound for us would be 

/ 1 \ 2 n 

^ {\\Bz\\ 2 <ty/n) < [Ct +-J=j , t > 0. (10.1) 

A bound like this will be crucial when we combine it with a union bound over a net, just like in 
Section 9. But there the nets were for compressible vectors 2 € C^. Now we will have to handle 
much larger sets: the level sets of LCD. As we will describe in Section 11, the nets of these level 
sets are exponential in 2N. To control them, it is crucial to have the small probability bound 
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that is also exponential in 2 n. (The difference between 2 N and 2 n is minor and can intuitively be 
neglected since N = (l + e)n.) 

At first glance, this should be possible because our problem is over C, so the dimension there 
should double compared to R. But recall that according to Assumption 1.1, the imaginary part of 
B is fixed, so there is no extra randomness that could help us double the exponent. 

One can even come up concrete examples where the bound (10.1) fails. Assume that the entries 
of B are real independent random variables with bounded densities, and that z is a real vector. 
Since the matrix R has n rows, the optimal small ball probability is 

¥{\\Bz\\ 2 <ty/^} <(Ct) n . (10.2) 

The same is true for complex vectors z with very correlated real and imaginary parts, such as for 
z = x + ix. 

These observations might lead us to the conclusion that it must be impossible to combine the 
small ball probabilities with nets. However, one can notice that the examples of vectors z we just 
considered are special. The real vectors z are contained in the IV-dimensional real sphere, and 
this sphere has a net exponential in N rather than 2 N. The same holds for vectors of the type 
z = x + ix. So these special vectors have smaller nets, which can hopefully be balanced by the 
small ball probabilities like (10.2). 

For other, more “typical” vectors, we might hope for stronger probability bounds. Consider, for 
example, the vector z = x + iy, where x and y have disjoint support and both have norms fi(l). 
Still assuming that B is a real matrix, we then have = ||Hx||| + ||i?y|| 2 . The assumption 

of disjoint support yields implies that Bx and By are independent, and ||Hz||| is thus a sum of 
2 n independent random variables (the row-vector products). So we do have a double amount of 
randomness here, and 

p{||Bz|| 2 <ty/K\ < ( Ct) 2n . 

Such probability bounds can balance a net for the whole sphere of C , which is exponential in 2N. 

Guided by these examples, we may surmise that the small ball probabilities for Bz and the 
cardinalities of nets for vectors z both depend on the correlation of real and the imaginary parts 
of z. Exploring this interaction in search for tight matching bounds for both quantities will be the 
main technical difficulty in proving Theorem 8.1. We will get a hold of small ball probabilities in 
the current section, and of cardinalities of nets in Section 11. 


10.1. Toward a more sensitive bound. We start by representing ||Bz||| as a sum of independent 
random variables exactly as in Section 9.2, leading up to (9.2). In a moment, we will apply 
Littlewood-Offord theory for each term of the sum in (9.2). To do this, we express these terms as 
functions of the rows of Rjc x j as follows: 


\\Bz\\l> Y, X J+ Y ?= E \\Vj(.Ri)j-Ui\\l 

ie[n]\J ig[n]\J 


(10.3) 


Here 



G R 


2 xJV 


is a fixed matrix, Rj denotes the i-th row of R, and U{ G R 2 are fixed vectors. 

Note that at this time we have three different ways to represent a complex vector z G C N : the 
usual way z = x + iy, as a long real vector z = ( x ) G M . 2N , and as a 2 x IV real matrix V as above. 

All ( Ri)j in (10.3) are independent real random vectors with all independent coordinates. We 
can now apply Theorem 7.5 in dimension m = 2 and for L = 4/^yp. It yields 

- <Mh <t}< det{K ^ /)I/2 (‘ + ~D^y/)Y ’ t - a (la4) 
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Here we use the notation D 2 (Vj) to emphasize that the least common denominator used in this 
application of Theorem 7.5 is for 2 x | J| matrices, as opposed to the one for vectors which we will 
focus on later. 


10.2. Disregarding the arithmetic structure. The small ball probability bound (10.4) relies 
on two different qualities of z. First, the arithmetic structure of z is reflected in the least common 
denominator D 2 (Vj). Second, the correlation between real and imaginary parts of zj is measured 
by the term det(VjVj) 1 / 2 . 

In this particular place of the argument, we may essentially disregard the arithmetic structure 
of z. One can get rid of D 2 (Vj) using Proposition 7.4, which states that 

D ^ ] >- < 10 - 5 > 

To bound ||Vj||oo, let us introduce a set of small coordinates as follows. 

Definition 10.1 (Small coordinates). Fix 5 € (0,1) and let z € C N . We will denote by sm(z) the 
set of indices of all except the 5N largest (in the absolute value) coordinates of z. If some of the 
coordinates of z are equal, the ties are broken arbitrarily. 


If z is a unit vector in C N and J is a subset of sm(z), a simple application of Markov’s inequality 
yields ||zj||oo < ^=. Moreover, by definition of V, we have ||Vj||oo = ||-Zj||oo- Thus 


l|V>||oo < 


1 

Vsn' 


Substituting this into (10.5), we conclude that 

d 2 (Vj) > - 2 ^/sn. 


( 10 . 6 ) 


This crude estimate leads to the appearance of the term 1/y/eN in Theorem 8.1. One can probably 
remove this term by involving the arithmetic structure. However, this would come at a price of a 
significant increase of the complexity of the argument, so we did not pursue this direction. 


10.3. Quantifying the real-imaginary correlation. The determinant det(VjVj ) 1|/2 measures 
the correlation between real and imaginary parts of zj. For example, if the real and imaginary 
parts are equal to each other, then the determinant vanishes, and the small ball probability bound 
(10.4) becomes useless. 

To make the bound as strong as possible, one would choose the subset J so that, on the one 
hand, it lies in sm(z) to ensure (10.6), and on the other hand, the determinant detlVjVj) 1 ^ 2 is 
maximized. This motivates the following definition. 

Definition 10.2 (Real-complex correlation). For z € and 6 € (0, 1), we define 
d(z) = max jdet(VjVj') 1 / 2 : J C sm(z), |J| = <5ivj . 

Clearly, d(z) € (0,1) for any unit vector z. 

Choosing J that achieves the maximum in the definition of d(z) and using the bound (10.6), we 
conclude from (10.4) that 

p {\Wj(Ri)j -Ui\\ 2 < t] < fL(t + H=) 2 , t > o. 

Substituting this into (10.3) and using Tensorization Lemma 3.3, we obtain the following result. 
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Theorem 10.3 (Small ball probabilities via real-imaginary correlation). Let B € <C nxN be a 
random matrix satisfying Assumptions 1.1 and l.f, and let 5 € (0,1). Then for a fixed vector 
z € with \\z \\2 = 1 we have 


V{\\Bz\\ 2 <ty/fi} < 


C 

d{z) 


t + 


1 V 


\f5n' 


(1—<S)n 


t > 0. 


10.4. The essentially real case. Theorem 10.3 is useful for vectors z whose real-imaginary cor¬ 
relations d(z) are not too small. We wonder what could be done in the “essentially real” case where 
d(z) happens to be small? 

Our strategy will be different in that case. Let us first prove a version of Theorem 10.3 that is not 
based on d(z), but where t is understandably exponential in (1 — 5)n rather than 2(1 — 5)n. Such 
probability bound will hold for incompressible vectors z (which were introduced Definition 9.1), 
and it will be stronger than the simpler but more general bound of Proposition 9.2. 


Theorem 10.4 (Small ball probabilities for general incompressible vectors). Let B € C nxN be 
a random matrix satisfying Assumptions 1.1 and l.f, and let 6 6 (0,1). Then for a fixed vector 
z € Incomp we have 

r n / i \i 

^{\\Bz\\ 2 <ty/n} < —\t + -j=J , t> 0. 

Proof. The argument is somewhat simpler than for Theorem 10.3. Consider the set of small co¬ 
ordinates sm(z) introduced in Definition 10.1. By definition of that set combined with Markov’s 
inequality, and Definition 9.1 of incompressible vectors, we have 


Dsm(z) I loo — 


1 


l'2-sm(z) ||2 ^ C. 


Vw’ 

It follows that there exists a subset J C sm(z) with |J| = 5N and such that 

1 


| Z J 11 OO — 


y/5N’ 


\ZJ 2 > 


> o/S. 


(10.7) 


(The first inequality is trivial, and the second can be obtained by dividing sm(z) into 1/5 — 1 blocks 
of coordinates of size 5N each, and arguing by contradiction.) 

Since zj = xj + iyj, either the real part xj or complex part yj has ^-norm bounded below by 
c\/5/2. Let us assume without loss of generality that xj satisfies this, so 


I^jIIoo < ^=, \\xjh>c!V5. 


( 10 . 8 ) 


To control ||-Bz|| 2 , we can proceed similarly to the proof of Proposition 9.2, taking as the starting 
point the bound 

\\Bz\\l > ^2 Xj where Xj = R-i;j x j + ftp (10.9) 

ie[n]\J j£J 

For each sum defining Xj, we can apply the small ball probability bound of Corollary 7.6 with 
L = \J 8/p. This gives 

p{rasi}£ ^( t+ o(b). 

We can use the two inequalities in (10.8) to get rid of the two terms dependent on xj. Indeed, 
Proposition 7.4 and the first inequality in (10.8) yield 

D{xj) > ^VdN. 
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1 


Using this and the second inequality in (10.8) gives 

C 


P {\Xj\ < t} < —r= (t-\ - -j= I 


\fb ' \fbn' 

Using this bound for each term of the sum in (10.9) and applying Tensorization Lemma 3.3, we 
complete the proof. □ 

Next, we will show that for vectors with small d(z), not only a 5N fraction of coordinates but 
almost the entire real and imaginary parts are close to each other. This strong constraint intuitively 
means that the set of such vectors is relatively small, and we will indeed construct a small net for 
such vectors later. 

Lemma 10.5 (Real-imaginary correlation). Let z € C N and set I := sm(z). Then 

det(VrU/) 1/2 < 

0 

Proof. The argument is based on Cauchy-Binet formula, which yields 

det(VjVj) = Y, det (^) 2 , 

hCl, |L|=2 


( 10 . 10 ) 


where the sum is over all () two-element subsets of I. Similarly, for each set J as in the definition 
of d(z), that is for J C /, | J\ = 6N , we can expand 

det(VjVj) = Y d et(U /2 ) 2 . 

hCJ, |/a|=2 

Summing over J, we get 

E det (VjVj)= Y E det(U /2 ) 2 . 

JCI, | J\=SN Jcl, \J\=SN I 2 CJ, |/ 2 |=2 

To simplify the right hand side, note that every two-element set I 2 C I is included in (x^ 2 ) se ^ s 
J, where we denote IVo := |/| = N — SN. Therefore 

N 0 

,6N — 2, 


Y det (VjVj) = 

JCI,\J\=SN 


Y det(U/ 2 ) 

hCl,\h\=2 


The sum in the right hand side equals det(V/V ; T ) by (10.10). Each determinant det {VjVj) in the 
left hand side is bounded by d(z) 2 by definition. This yields 


No 

5N 


d(z) 2 > 


No 


det(U/U/ 


y 5N - 2 y 

Simplifying this inequality and using that No = N — 5N, we complete the proof. 

11. A NET FOR VECTORS WITH GIVEN LCD AND REAL-IMAGINARY CORRELATIONS 


□ 


Thanks to Section 9, we can now focus on the set of incompressible vectors. Our goal is to 
construct a net for the set of incompressible vectors z with given least common denominator D(z) 
and real-imaginary correlation d(z). Let us define this set formally. 

Definition 11.1 (Level set for LCD and real-imaginary correlations). For D,d > 0, we define with 
the following subset of C N : 

Sd,<i = {z € Incomp : D/2 < D(z) < D ; d(z) < d} . 
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Hidden in this definition are the parameters L from the definition of D(z) and 5 from the 
definition of d(z ), which we assume to be fixed. When we work with level sets So,d, we can 
automatically assume that 

D > cq/N, (11.1) 

since it is relatively easy to see that D(v) > cq/N for every vector v € Incomp; see [38]. 


A first attempt at constructing a small 7 -net of the level set So,d could be to use the standard 
volume argument. For instance, if one chooses 7 = \/N/D, the volume argument will yield a net 
of cardinality 


D \2N 

7W) 


( 11 . 2 ) 


The exponent 2N appears here because the vectors are complex. 

This net is too large for our purposes. Our next, refined, attempt is to leverage the information 
about LCD of the vectors in Indeed, known constructions lead to the existence of a finer net, 

namely with 7 <C y/N/D, and still with approximately the same cardinality as in (11.2), see [25]. 

However, this net would still be too large if we try to use it in combination with the small ball 
probability bound given in Theorem 10.3. Our final, successful, refinement of the construction will 
use both LCD and the real-imaginary correlation d(z ) of the vectors in Sjj Ideally, we would 
hope to construct a 7 -net with 7 <C y/N/D and with cardinality bounded by 



2N 


\N 


(11.3) 


The correction term d N will allow the net to become smaller for more “real” vectors - those with 
stronger real-imaginary correlations. 

Of course, if d is extremely small, such as for purely real vectors, the cardinality in (11.3) is too 
good to be true. For purely real vectors, the ideal cardinality would be the same as in (11.2) except 
with exponent N, that is 



N 


(11.4) 


Summarizing, we hope to construct a 7 -net of the level set S£>,d for some 7 <C y/N/D, and with 
cardinality bounded as in (11.4) if d is not too small (the genuinely complex case) as in (11.3) if 
d is very small (the essentially real case). The following theorem, which is the main result of this 
section, provides slightly weaker but still adequate bounds. 


Theorem 11.2 (Nets for level sets). There exist constants C,c,c> 0 such that the following holds. 
Assume that L from the definition of D(z) is such that L < cy/N , and 5 from the definition of d(z) 
is such that 5 € (0, c). Fix D > 0, and let 


7 = 


y/log + — and do = C5 ■ max ^ 7 , 


Vn \ 

D )' 


(11.5) 


1. (Genuinely complex case). For any d > do, there exists a (O'))-net of the level set with 
cardinality at most 

fi-N -2SN-1 ( CD \ 2N ~ 5N ^N-SN-1 




2. (Essentially real case). For any d < do, there exists a (C'y)-net of the level set So,d with 
cardinality at most 

fi-5N^-25N-l ( CD \ N-SN+1 


\VnJ 
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To compare this result with the ideal bounds (11.3) and (11.4), let us use it with L <C y/~N. 
Then 7 <C y/~N/D as we needed, and the theorem gives bounds similar to (11.3) and (11.4). 

We will prove Theorem 11.2 in the next few subsections. 


11.1. Step 1: setting out the constraints. We will first construct a net for the points in Sjj d 
with given set of small coordinates; in the end we unfix this set using the union bound. So let us 
fix a subset of indices 

I C [N] with |J| = JV - 5N = : N 0 

and define the following subset of C 1 : 

Sd 4 ,i := {zi : z £ S Djd , sm(z) = 1} . (11.6) 

Consider a point 27 £ Sd,cI,i- As before, depending on the situation, we will work with one of 
the three representations of zj: via real and imaginary parts 27 = x d + iyj, via a long real vector 

zj = (*() £ R 2iV °, and via the 2xlVo real matrix Vj = [ ]. 

Juxtaposing the available constraints on 27 will help us to construct a small net, so let us set 
out precisely what we know about zj. We have three pieces of information. 

1. Norm. We know that \\z \\2 = 1 and 21 £ Incomp. Since \I C \ = 5N < cN, the coordinates of 2 : 
in I must have a significant energy, i.e. 

11 - 2 / 112 > c. 

Since H 27 H 2 = \\xi\\\ + ||y/|||, at least one of these terms is bounded below by c 2 / 2 . Let us assume 
without loss of generality that it is the first term, which yields 

|<IM 2 <i, \\yih < i- (ii.7) 

2. Least common denominator. We know that D(z) € (D/2,D], By definition, this implies that 
there exists 0 £ [D/2, D] and integer points p,q £ 7L l such that 

\\Ox 1 -pW 2 < L]J log + \\0yi - q\\ 2 < L\j\og + ^. ( 11 . 8 ) 

3. Real-imaginary correlation. We know that d{z) < d. By Lemma 10.5, this implies that 

det(UU T 7 2 < 21. 

On the other hand, the determinant is the product of the singular values, that is 

det (ViVj) 1 / 2 = s 1 {y I )s 2 (y I ). 

The larger singular value si(V/) is the operator norm of ||V/||, so it is bounded below by the norm 
of either of the two rows of Vj. Thus si(V) > \\x 1 W 2 > c/2 due to (11.7). This gives 

S 2 (Vj) < C'u, where v := ^4. (11.9) 

0 
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11.2. Step 2: an attempt at construction based on LCD. Let us ignore for a moment the 
information about real-imaginary correlation, and try to construct a net for So,d,i based on the 
least common denominator only. Dividing the inequalities in (11.8) by 9 and using that 9 > D/2, 
we obtain 

L 


xi 


< 

2 - 9 


log_ 


9 2 L 

L~~D 


log. 


D 

L 


= 2 7 , 


and similarly 


q 

yi-- e 


< 2 r ). 


( 11 . 10 ) 


( 11 . 11 ) 


This means that xi and yj can be approximated by scaled integer points p/9 and q/9, respectively. 

To count the integer points p and q, let us check their norms. By triangle inequality, (11.8) 
implies 


I 2 < || 0 ar ||2 + L\/ log + < 2D + L 


log. 


2D 

~L 


< 3D, 


( 11 . 12 ) 


where we used that ||x /||2 < 1 and 9 < D. 

Notice that the bound (11.12) is sharp within an absolute constant. Indeed, a similar reasoning 
gives 


9 


2 > \\0xi\\ 2 - L\ log, T >cD- L\ log + — > c'D. 


L 


D 


L 


In the second inequality we used that \\x 1 W 2 > c/2 due to (11-7) and 9 > D/2. In the last inequality, 
we used that D > cqVN due to (11.1) and that L < cVN, choosing c < Co in the formulation of 
the theorem, which ensures that the term cD dominates. 

Similarly to (11.12), we obtain 

hh < 3D. 

Summarizing, we have shown that 27 can be approximated by a scaled integer point p + iq, where 
both p and q have norms at most 4 D. Formally, the set 

Mi := {a(p + iq) : a G M; p, q € 7L 1 n B{ 0,3D)} 


is a (4y)-net of Su,d,i- How large is this net? Since D > co\//V due to (11.1), a standard volume 
argument shows that the number of integer points in the real ball B( 0, 3D) in dimension |/| = A^o 
is bounded by (CD/y/No) N °. Thus the number of “generators” p + iq of the net Mi is bounded by 


CD \ 2^0 


(11.13) 


Further, one can easily discretize the multipliers a (we will do this later), and obtain a finite net 
of Su,d,i °f cardinality similar to (11.13). 

The bound (11.13) is close to the ideal result (11.3). However, it misses the d N factor, which is 
understandable since we have not used the real-imaginary correlation d{z) yet. Let us do this now. 


11.3. Step 3: factoring in the real-imaginary correlation. Let us rewrite the approximation 
bound ( 11 . 8 ) in terms of the 2 x Nq matrices 


V r = 

xj 

T 

and W := 

pT 

T 


L Vi J 


L q \ 


It follows that 9Vi is approximated by W in the operator norm: 

II 9V! - W\\ < 2L^/log + |. 
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Weyl’s inequality implies that the corresponding singular values of 9Vj and W are within 2 L \J log + j] 
from each other, and in particular we have 

s 2 {W) < s 2 (dV I ) + 2L^/log + | 

Recalling from (11.9) that s 2 (Vj) < C'u and that 9 < D, we conclude that 

s 2 (W) < CDu + 2 L\j log + (11-14) 

We can interpret this inequality as saying that the vectors p and q that form the rows of W 
are almost collinear. Indeed, let P p ± denote the orthogonal projection in onto the subspace 
orthogonal to the vector p. We claim that 


\\P p ^h< (l + j|)s2W. (11-15) 

To see why this inequality holds, we can express the determinant det(lTW T ) 1 / 2 in two ways - via 
the base times height formula and as the product of singular values: 

det(WW T ) 1/2 = ||p || 2 • ||P p xg || 2 = si(W) s 2 (W). (11.16) 

The larger singular value si(W) is the operator norm of W, which is bounded by the sum of the 
norms of the rows: 

-si(ir) < ||p||2 + Iklb- 

Substituting this into the identity (11.16), we obtain the bound (11.15). 

To successfully apply the bound (11.15), we recall from Section 11.2 that ||g || 2 < 3 D and ||p || 2 > 
dD, and moreover s 2 (W) is bounded as in (11.14). Thus we obtain 


P p ±q \\2 < C^Du + L 



(11.17) 


Intuitively, this means that p and q are almost collinear, with the degree of collinearity measured 
by the real-imaginary correlation factor d(z) (which is reflected here through v = C'd/5). 


11.4. Step 4: construction of the net in the genuinely complex case. We are now ready to 
construct a net of Sjj ^,i based on both LCD and the real-imaginary correlation. Let us start with 
the genuinely complex case of the theorem, where d > do. Using definitions of 5o and 7 in (11.5) 
and recalling that v = C'd/5 , we can rewrite the inequality d> d^ as 


Du > CL 


log +? 


and Du > Cy/N. 


(11.18) 


By the first inequality, the first term dominates in the bound (11.17), and we have 

||.Ppj.g ||2 < 2CDu. (11.19) 

Arguing as in Section 11.2, we see that the set 

N\ V ' ! := {a(p + iq) : a € R; p, q £ 7L 1 n B( 0, 3D); ||P p xg || 2 < 2 CDu} 

is a (4y)-net of Sd d,i- The collinearity condition (11.19) included in this definition will allow us to 
bound the number of generators p + iq better than before. 

First, exactly as in Section 11 . 2 , the number of possible integer points p in the definition of Afj 1 ' 1 
can be bounded by the standard volume argument, and we have 

# jp’s in the definition of A/j 1 ' ) | < |z/ n B(0,3D)\ < ) 
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( 11 . 20 ) 









Next, for a fixed p, let us count the number of possible q 's that can make a generator p + iq. By- 
definition of A/") 1 "*, any such q is an integer point in the cylinder 

C{p,3D,2CDu) (11.21) 


where we denote 

C(p, a, b ) =: {uel 1 : ||P p it ||2 < a, ||P p ±u ||2 < 6 } . 

(Here obviously P p denotes the orthogonal projection in M 1 onto the line spanned by p.) 

By a standard covering argument, the number of integer points in the cylinder C{p,a,b ) is 
bounded by the volume of the Minkowski sum 


C(p, a, b) + Q where Q 


1 1 
2 ’ 21 


N 0 


Further, the unit cube Q is contained in the Euclidean ball B(0,y/No), and the Minkowski sum 
C(p, a, b ) + B( 0, v^Vq) is clearly contained in the cylinder 


C{p, a + y/1%,b+ y/No). 


This cylinder is a Cartesian product of an interval of length 2(a + y/No) and the Euclidean ball of 
radius b + y/No in the hyperplane orthogonal to the interval, so the volume of the cylinder can be 
bounded by 


2(a + y/No) ■ 


C(b + VNoh^o-i 

y/N^l ) 


( 11 . 22 ) 


We can apply this bound to our specific cylinder (11.21) where a = 3D and b = 2 CDu. By 
( 11 . 1 ), a > 4cy/ No, and by the second inequality in (11.18), b > Cy/No, which means that both 
y/No terms can be absorbed into a and b. Thus the number of integer points in the cylinder (11.21) 
is bounded by 


Ca 


Cb \N 0 -i 

7 W 


< CD 


CDv\ 


N 0 -1 


(11.23) 


Summarizing, now we know the following about the generators p + iq of the net A/"j7 The 
number of possible points p is bounded as in (11.20). For each fixed p, the number of possible q's 
that can make the generator p + iq is bounded by the quantity in (11.23). Thus, the total number 
of generators p + iq is bounded by 


cd \ No cd (^) n °- 1 

y/w 0 ) vyiVo ) 


(11.24) 


11.5. Step 5: finalizing the genuinely complex case. Three minor points still remain to be 
addressed in this case. First, the net N/' 1 we constructed is infinite due to the real multiplier a. 
Second, this net controls only the coordinates that lie in I (recall the definition (11.6) of the set 
Sd,cL,i)- Third, the construction we made was for a fixed set of coordinates I. We will now take 
care of these issues. 


11.5.1. Discretizing the multipliers. The first point can be addressed by discretizing the set of 
multipliers a in the definition of the net ASince Sb di is a subset of the unit ball B(0,1), we 
may consider only the multipliers a that satisfy || a(p + iq )||2 < 1. For a fixed generator p + iq, we 
discretize the interval of multipliers {a € M : \\a(p + iq )\\2 < 1 } by replacing it with a set of 2/7 
numbers aj that are equally spaced in that interval. The vector a(p+iq) can then be approximated 
by a vector cti(p + iq) with error at most 7 in the Euclidean norm. 
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Since Ais a ( 47 )-net of Sr, the discretization we just constructed is a ( 67 )-net of So.dj- 
Let us call this net M.P. The cardinality of M.P is bounded by ( 2 / 7 ) times the number of 
generators of M\ l \ which we bounded in (11.24). In other words, 


| M 


Mi 
1 I 



CDyocD^yo-i 


(11.25) 


11.5.2. Controlling the coordinates outside I. The second point we need to address is that since 
M.y is a ( 67 )-net of the set So,d,i = {27 : 2 € So,d, sm(z) = /}, this net can only control the 
coordinates of z in I. To control the coordinates outside I, it is enough to construct a separate 
7 -net for the set 

{2/ = 

Since \I C \ = 5N, a standard volume bound allows one to find such a net of cardinality at most 
( 5 / 7 ) 2<w . Combining the two nets, we conclude that there exists a ( 77 )-net of the set 


of cardinality at most 


Tdai = {z ■■ z e S D ,d, sm(z) = 1} 



11.5.3. Unfixing the set of coordinates I. Finally, the third point we need to address is that our 
construction was for a fixed set of indices /. To unfix I, we note that there is at most 




ways to choose I. So, combining the nets we constructed for each I, we obtain a (7y)-net of Sr>,d 
of cardinality at most 



Substituting here the bound (11.25) for recalling that Nq = N — 5N and v = C'd/5 , and 

simplifying the expression, we prove the first part of the theorem. 


11.6. Step 5: the essentially real case. We proceed to the essentially real case, where d < do- 
This means that at least one of the inequalities in (11.18) fails. 


11 . 6 . 1 . 


Case 1. Assume that the first inequality in (11.18) holds but the other fails, that is 


Du > CL 



D 

~L 


and Du < CVn. 


We proceed in the same way as in the genuinely complex case until we apply the general bound 
on the integer points (11.22) to our cylinder with a = 4 D and b = 2 CDu. This is the only place 
where we used the second inequality in (11.18), which now fails. This means that b gets absorbed 
into the y/No term, and the the number of integer points in the cylinder ( 11 . 21 ) is consequently 
bounded by 


Ca( 


Cy/N^\N 0 -1 


< C N D. 


A x/Ao ) 

Using this bound in place of (11.23) and arguing exactly as in the genuinely complex case, we 
complete the proof for this sub-case. 
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11 . 6 . 2 . Case 2. The remaining sub-case is where the first inequality in (11.18) fails, that is 


Dv < L 



D 

T' 


Then the second term dominates in the bound (11.17), and we have 


(11.26) 


\P p ±qh < 2 C'LWlog + 


D 

T' 


(11.27) 


Let us fix a points zj = xj + iyj from Since the orthogonal projection has norm one, 

( 11 . 11 ) yields 

T 


P D Vi ~ 


< 7- 


Combining this with (11.27) and using triangle inequality, we obtain 

2 CL 


\P p ±yih <7 + 


!og + j. 


p a± "“ - ' 1 e 

Recalling that 9 > D and the definition of 7 in the theorem, we obtain 

\\P p ±yih < C'j. 

We can interpret this inequality as follows. There exists a multiplier (3 £ M such that 

hi ~ M2 < c r 

Let us rewrite (11.10) in a similar way - there exists a multiplier a = 1/6 £ M such that 

||xi - ap\\ 2 < 7 . 

Recalling that zi = xi + iyj, it follows that 


zi - (a + i/3)p || 2 < C 7 . 


Furthermore, recalling from (11.12) that ||p ||2 < 3 D, we conclude that the set 


A/y 2 ^ := {(a + i/3)p : a, /3 € M; p £ 7L 1 n B(0, 3H)} 

is a (C 7 )-net of Sd^j- 

The number of generators p can be counted by a volume argument, exactly as in (11.20): 

# jp’s in the definition of A/’j 2 '* j < \Z ! n B(0,3D)\ < 

Finally, we can discretize the multipliers a + i(3 and unfix the set / similarly to how we did it in 
Section 11.5. We obtain a ( 67 )-net of Se >4 of cardinality at most 


e\ SN / 5\ 25N fC\ 2 / CD 0 
5 ) V7/ V7/ V-^/Av 


(11.28) 


(To recall, the first term here comes from unfixing /, the second from controlling coordinates outside 
/, the third from discretizing the multipliers in the complex disc {a + i/3 € C : ||(a + i^)p\\ 2 < 1}, 
and fourth from (11.28).) 

Recalling that Ajj = N — 5N and v = C'd/5 , and simplifying the expression, we prove the second 
part of Theorem 11.2. □ 
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12. Structure of kernels, and proof of Theorem 8.1 on the distances 

In this section we will show that random subspaces, and specifically the kernels of random matri¬ 
ces, are arithmetically unstructured, which means that they have large LCD with high probability. 
The following is the main result of this section. 

Theorem 12.1 (Kernels of random matrices are unstructured). Let B G C nxAr ^ e a ranc [ orn 
matrix satisfying Assumptions 1.1 and l.f, and assume that n = (1 — e)N for some e G (2/n,c). 
Set L ■= \ZeN. Then, 

P j.D(ker B, L) < min (^/Ne c ^, eN^J and Bb,m j < e~ cN . 

This theorem will follow by balancing the two forces - the small ball probability estimates of 
Section 10 and the net for vectors with given LCD of Section 11. 

It would be convenient to first state a preliminary version of Theorem 12.1 which holds for 
vectors with given levels of LCD D(z) and real-imaginary correlation d(z). We will work here with 
somewhat smaller level sets than So.d from Section 11. For D,d > 0, we consider 

S jo,d = {z G Incomp : D/2 < D(z ) < D\ d/2 < d(z ) < d} . 

Clearly, So,d is the union of the sets So,d for all d < do- 

Proposition 12.2 (Kernels and level sets). Let B G C nxAr be a random matrix satisfying Assump¬ 
tions 1.1 and 1.4, and let n = (1 — e)N for some e € (2/n,c). Set L from the definition of D(/z) to 
be L := \feN. Let 

D < min {^/N e c ^, eN^j 

and let do be the threshold value from (11.5) for 6 = Cy/e. 

1. (Genuinely complex case). For any d G [do, 1], we have 

P {Sf),d H ker B / 0 and Bb,m} < e~ N . (12-1) 

2. (Essentially real case). For any d G [0, do], we have 

P {Sf),d H ker B / 0 and Bb,m} < e -iV . 

Note here that in the genuinely complex case, we use an additional stratification by d(z) by 
considering the sets So,d, while in the essentially real case, we treat the set So,d in one strike. 

We will prove this proposition in the next two subsections. 

12.1. Proof of Proposition 12.2 in the genuinely complex case. 

12.1.1. Step 1: combining the small ball probability with the net. We fix a vector z G So,d and apply 
Theorem 10.3. By the assumptions on n, we can write the conclusion of this theorem as follows: 

, t > 0. (12.2) 

Let us apply this bound for t := X\/d, where A G (0,1) is a parameter whose value we choose later. 
If we assume that 

A V6>^=, (12.3) 

V5N 

then the probability bound can be expressed as 

r /- 1 /CX 2 6\0-S)(L~e)N 

f{\\b z \\ 2 < XVM} < (—) 


Bz \[2 < tv N !• < 


C( 1 y 

dv + ydivJ 
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Next, Theorem 11.2 provides us with a (C^-net of Sn,d of controlled cardinality. Clearly, the 
same is true for the smaller set So d- Let us denote such a net by J\f. Using a union bound, we can 
combine the probability bound with the net as follows: 


p:= P ( inf ||Bz || 2 < \Vsn\ < 

[zGAf J \ a 


C*A 2 <5\(1—<5)(1—e)AT 


m 


recalling the bound on \J\f\ given by Theorem 11.2, we obtain 


\ d J 


(12.4) 


Our goal is to show that p < e N 


12 . 1 . 2 . Step 2: simplifying the probability bound. Assume that <5 (which we recall is a parameter 
from the definition of d(z)) is chosen such that 

2e < 6 < y/e. (12.5) 

We claim that d, 6 and 7 can be removed from (12.4) at the cost of increasing the bound by C N . 
To see this for d, note that the exponent of d in the bound (12.4) is 

(N — SN — 1) — (1 — <J)(1 - e)N = (1 - 6)eN - 1 > eN/2 - 1 > 0. 

Since d € [0,1], removing d can only make the bound (12.4) larger. 

Similarly, the exponent of 1/5 in the bound (12.4) is 

N - (1 - <5)(1 - e)N < 25N. 

Thus 5 contributes to the bound a factor not larger than < C N . 

Finally, to evaluate the contribution of 7 , let us denote 

D D 
D := — = -=. 

L VeiV 

Notice in passing that D > e since D > co'/N by (11.1). Recalling the definition of 7 in (11.5), we 
have 


7 y / W+£ ) 

Therefore, the contribution of 7 to the bound (12.4) is a factor not larger than 

-2SN-1 jjASN' 


To bound this quantity further, we can use the assumption on D and the comparison (12.5), which 
imply 

D< -L e c /V~e< I e c/«. (12.6) 

v e 0 

This implies that D iSN < C N . Therefore, 7 contributes to the bound (12.4) a factor not larger 
than C N . 

We have shown that d, 5 and 7 can be removed from (12.4) at the cost of increasing the bound 
by C N . In other words, we have 


p < 


CA 2(l-5)(l- £ ) S 


Solving for A, we see that the desired bound 


p < e 
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—N 









holds whenever 


A < 

which in turn holds if A is chosen so that 


/ C \ 2(l- e )(l -S) 

\^£d) 


A < 


/ c \ l+2<5 


(12.7) 


12.1.3. Step 3: approximation by the net. In the previous step we showed that the event 

inf 11 11 2 < A VSN 

z 0 €Af 

holds with probability at least 1 — e _Ar , as long as the parameter A satisfies (12.3) and (12.7). Let 
us fix a realization of the random matrix B for which this event does hold. 

Fix a vector 0 € So,d■ To finish the proof of (12.1), we need to show that H-B 0 H 2 > 0. Let us 
choose zq € Af which best approximates the vector z; by definition of J\f we have 


' - Z0II2 <C 7 = 


C y log + D 
D 


- . D 

Assume that the event Bb,m occurs. By triangle inequality, it follows that 

II-B 0 H 2 > \\Bzo\\ 2 - ||-B|| ■ ||0 - 0 O || 2 

> \VJn - mVn ■ 

This quantity is positive as we desired if A satisfies 


CL/log+D 


D 


A > 


C^j\og + D 

VdD 


( 12 . 8 ) 


Recall thatwe allow our constants to depend on M. This allows to absorb M in C in the inequality 
above. 


12.1.4. Step 4 : final choice of the parameters. We have shown that the conclusion (12.1) of the 
proposition in the genuinely complex case holds if we can choose the parameters <5 and A in such a 
way that they satisfy (12.5), (12.3), (12.7) and (12.8). We will now check that such a choice indeed 
exists. 

First, let us choose A just large enough to satisfy (12.8); thus we set 


A : = 


Cyiog+D 

V6D 


To check (12.3), we use the assumption that D < eN, which implies that D < yfeN. This and the 
choice of A imply 

A VA>i>^L - 1 


> 


D ~ ~ VSN’ 

where the last inequality follows from (12.5). This proves (12.3). 
It remains to check (12.7), which takes the form 


CL/log. D 


V6D 


< 


C \ l+2<5 

\feD / 
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Rearranging the terms and dropping the term (l/y ^) 2 * 5 which is smaller than an absolute constant 
due to (12.5), we may rewrite this restriction as 

Z ) 4 ' 5 log, D < —. 

£ 

Substituting here the assumption D < which we already used in ( 12 . 6 ), we see that the 

restriction is satisfied if 

C c<5 
5 ~ £ 

It remains to choose 5 := Cyfe\ then the restriction is satisfied, and we have verified (12.7). This 
finished the proof of the proposition in the genuinely complex case. 


12.2. Proof of Proposition 12.2 in the essentially real case. The argument is similar, and 
even simpler, than in the genuinely complex case. We just need to use the appropriate small ball 
probability bound, namely Theorem 10.4 instead of (12.2), and the corresponding bound on the 
net - the one from the essentially real case in Theorem 11.2. This leads to the following variant of 
(12.4): 


p < 5N+1 . 


Vn' 

Like before, we can remove 5 and 7 , simplifying the bound to 


P < 


CA (i-5)(i- £ ) (c^D) 


fvT-5' 


N 


Then the desired bound p < e N holds if A is chosen so that 


A < 



1 

l — £ 


In particular, this holds if A satisfies the same restriction as in the genuinely complex case, namely 
(12.7). (To see this, note that 1 + 25 > 1/(1 — e) by (12.5).) 

The rest of the proof is exactly as in the genuinely complex case. Proposition 12.2 is proved. □ 


12.3. Proof of Theorem 12.1. For convenience, let us denote 

Dq := min (VNe 0 ^, £N). 

Assume that D(ker B, L ) < Do, and the event Bb.m occurs. This means that there exists z € S N_1 
such that 

z € kerR, D(z) < Dq. 

We can bound the probability of this event by considering the following cases. If z is compressible, 
then such event holds with probability at most e~ ClN according to Proposition 9.4. Assume that 
2 is incompressible. In the genuinely complex case where d(z) > do, the vector z must belong to 
a level set Sfjj for some D € [coy/N,Do] and d € [do>l]- (The lower bound on D here is from 
(11.1).) For given D and d, the probability that such z exists is at most e~ C2N by the first part of 
Proposition 12.2. In the remaining, essentially real case where d{z) < do, the vector z must belong 
to a level set So,d 0 for some D € [coVN, Dq\. For given D and d, the probability that such 2 exists 
is at most e~ C2N by to the second part of Proposition 12.2. 

This reasoning shows that the probability that D(ker B, L) < Do is bounded by 


e~ clN + 


E E 


e~ C2N + 


E 


0 -c 3 N 


D<£[c 0 VN,D 0 ] 

dyadic 


de[do,l] 

dyadic 


De[coVN,Do] 

dyadic 
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The definitions of the level sets allowed us here to discretize the ranges of D and d by including 
only the dyadic values in the sum, namely the values of the form 2 k , k £ Z. 

It remains to bound the number the terms in the sums. The number of dyadic values in the 
interval [coiV, Dq\ is at most 

log (^ 7 ^) - logN 

since Dq < N by definition. Similarly, using the definition 11.5 of do, we see that the number of 
dyadic values in the interval [do, 1] is at most 


log (£)- log (;|)-(S 9 £logJV ' 


Therefore, the probability estimate (12.9) is bounded by 

e~ ClN + log 2 {N)e~ C2N + log(lV)e- C3iV < e~ cN . 

This completes the proof. 


□ 


12.4. Distances between random vectors and subspaces. We are ready to prove Theorem 8.1 
on the distances between random vectors and subspaces. It can be quickly deduced by combining 
small ball probability bounds we developed in Section 7 with the bound on LCD for random 
subspaces, namely Theorem 12.1. 

Given a random vector Y € K fc and an event fl, denote 

£n(Y,r) = sup P{||T - y\\ 2 < r and D} . 

yG R fe 

In Section 8.1, we reduced Theorem 8.1 to a problem over reals. Indeed, according to (8.3), it 
suffices to bound 

Po ■= £b bm {p^z,2tVIn). 

For this, we recall that dim(kerL>) = 2eN and apply Corollary 7.10, which yields 


/ CL \2eJV/ 


Po -vvi]vJ V 


T + 


VeN 


N 2 eN 
\) 


D(ker B,L)- 

Next, Theorem 12.1 states that for L = \/eN, with probability at least 1 — e~ cN we have 

D(kev B, L) > min {^/Ne c ^, eN^j . 

Substituting this into (12.10) and simplifying the bound, we obtain 

1 


( 12 . 10 ) 


Po < 


C ( T + + 


VeN 


2 eN 


+ e~ cN < 


c(r + -^ + e- c '/^) 

V VeN > 


1 2 eN 


This inequality combined with (8.3) completes the proof of Theorem 8.1. 


□ 


13. Proof of Theorem 6.1 on invertibility for general distributions 

The strategy of the proof of Theorem 6.1 will be very close to the argument we gave for continuous 
distributions in Section 5. However, there are two important differences. First, the distance bound 
for continuous distributions given in Lemma 5.4 can not hold for general distributions; we will 
replace it by Theorem 8.1. Another ingredient that is not available for general distributions is the 
lower bound given in Lemma 5.5 and all of its consequences in Section 5.5. Instead, we will use 
the small ball probability bounds for general distributions that we developed in Section 7.3. Let us 
start with this latter task. 
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13.1. G is bounded below on the small subspace E . Here we extend the argument of 
Section 5.5 to general distributions. 

Lemma 13.1. With probability at least 1 — e~ cn , we have S E - C Incomp. 

Proof. The definition of E~ in Section 5.4.2 implies that 

||.Bz ||2 < CTEs/n for all z G S E ~. 

On the other hand, Proposition 9.4 states that with probability at least 1 — e ~ cn , 

||Hz ||2 > cy/n for all 2 G Comp. 

Since these two bounds can not hold together for the same z, it follows that the sets S E - and Comp 
are disjoint. This proves the lemma. □ 

The following result is a version of Lemma 5.5 for general distributions. 

Lemma 13.2 (Lower bound for a fixed row and vector). Let Gj denote the j-th row of G. Then 
for each j, z G Incomp, and 6 > 0, we have 



Proof. Fix z = x + iy and let J be the set of indices of all except cn largest (in the absolute value) 
coordinates of z. By Markov’s inequality and Definition 9.1 of incompressible vectors, we have 



Since zj = xj + iyj, either the real part xj or the complex part yj has ^-norm bounded below by 


c/2. Let us assume without loss of generality that xj satisfies this, so 

II^jIIoo < — 7 =, \\xj\\ 2 >C. (13.1) 

\Jn 

The first inequality and Proposition 7.4 imply that 

Di(xj) > cyfn. 

Here we use notation D i(-) for the LCD in dimension m = 1 and with L ~ i; note that it is distinct 
from the LCD in dimension m = 2 we studied in the major part of this paper. 

We proceed similarly to the proof of Lemma 5.5. Decomposing the random vector Z := Gj as 
Z = X + iY, we obtain the bound (5.13): 


F{\(Z,z)\<6}<£((X,x),9). 


Further, we use the restriction property of the concentration function (Lemma 3.2) followed by the 
small ball probability bound (Corollary 7.6), and obtain 



This completes the proof of the lemma. 


□ 


Using Tensorization Lemma 3.3 exactly as we did before in Section 5.5.1, we obtain the following 
version of Lemma 5.6 for general distributions. 

Lemma 13.3 (Lower bound for a fixed vector). For each x G Incomp and 0 > 0, we have 
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In particular, if 9 > 1/y/n, then the probability is further bounded by ( C6 ) en . This is similar 
to the bound we had in Lemma 5.6 for the continuous case. Using this observation, we deduce the 
following version of Lemma 5.7 for general distributions, and with the same proof. 


Lemma 13.4 (Lower bound on a subspace). Let M > 1 and p € (0,1). Let E be a fixed subspace 
of C n of dimension at most pen, and such that Se C Incomp. Then, for every 9 > 1/y/n, we have 

P { inf ||Gx|| 2 < and B GM \ < [C (M/ yflf 11 0 1_2# *] en . 

I xgSe ’ J 


This lemma implies a lower bound on the smallest singular value of G restricted to the set Se~ 
similar to Lemma 5.7. 


Corollary 13.5. Let M > 1 and p G (0,1). Then, for every 9 > 1/y/n, we have 

P | inf ||Gx|| 2 < 9\Jen and Se- C Incomp and V E - H £>g,m| < \C(M / y/e) 2fl 0 1_2#i ] en . 
[ xeS E - J 

13.2. Proof of Theorem 6.1. The argument will follow the same lines as in Section 5 for contin¬ 
uous distributions; here we will only indicate necessary modifications. 

Without loss of generality, we may assume that 

e > n~ 0A . (13.2) 

Indeed, (6.1) implies that t 0A e~ 1A > e - 1 n -0 ’ 4 , so the statement of Theorem 6.1 becomes vacuous 
whenever (13.2) does not hold. 

Since dist > distby (5.3), Theorem 8.1 implies that 

P {dist(Bj,Im(Hj)) < Ty/en and Ba,m} < (Ct) £TI (13.3) 

for any 

t > -^= + e~ c ^ =: ro. 
y/en 

Consider the random variables 

Yj := [max (dist (Bj, Hj), T^yfeh )]~ 2 • ±b a , m 

and argue as in Section 5.4.1. We see that Yj belong to weak L p for p = en /2 and ||L)'|| Pj oo < C 2 /en. 
By weak triangle inequality, this yields 




< (Cr) £n , r > 0. 


Therefore, 


n i ] n 

dist (Bj,Hj )~ 2 > — — and Bam / < ( Cr) sn + ^ P {dist (Bj,Hj )~ 2 / Yj and Ba,m} 

) 

n 

< ( Cr) en + ^P {dist (Bj,Hj) < T§\fen and Ba,m} ■ 
5=1 

Using again (13.3) and then (13.2), we see that this probability can be further bounded by 

{CTf n + u(CT 0 y n < (C\T ) £n for r > r 0 . 

Defining the subspaces E + , E~ and the event T>e- as in Section 5.4.2, we derive from this that 

p {(Pe-T and B a ,m} < (C\T) en for r > r 0 . (13.4) 
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We finish the proof as in Section 5.6.2. Set 


r = y/t and 


6 = 


Cy/t 


Then (6.1) ensures that r > To, so (13.4) holds. Furthermore, (6.1) and (13.2) guarantee that 
6 > 1 / y/n, hence Corollary 13.5 applies. Similarly to (5.18), we use Corollary 13.5, Lemma 13.1, 
and (13.4) to obtain 


( Ct 

< t\Jn and Ba,m} < P < sq < — • y/n and Ba,m 


< P {sg < 0 ■ yfen and S E - C Incomp and V E ~ n Ba,m} 
+ P {S E - </ Incomp and Ba,m} + P {(£>£-) c and Ba,m} 

< (C£-° m 6°- 9 y n + e ~ cn + (C'w) 2 ” 

< [Ce- 1A t 0A5 ] £n + e~ cn + ( Ct°- 5 Y n . 


It remains to check that the last two terms of the expression above can be absorbed into the first 
one. This is obvious for the third term, and follows from (6.1) for the second one as we assumed 
that e < c. This completes the proof of Theorem 6.1. □ 
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